[Blog](https://scrapfly.io/blog)   /  [data-parsing](https://scrapfly.io/blog/tag/data-parsing)   /  [How to Scrape an Entire Product Catalogue with Python](https://scrapfly.io/blog/posts/how-to-scrape-large-product-catalogs)   # How to Scrape an Entire Product Catalogue with Python

 by [Mayada Shaaban](https://scrapfly.io/blog/author/mayada-shaaban-90143e67) Jul 24, 2026 17 min read [\#data-parsing](https://scrapfly.io/blog/tag/data-parsing) [\#python](https://scrapfly.io/blog/tag/python) [\#scrapeguide](https://scrapfly.io/blog/tag/scrapeguide) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-large-product-catalogs "Share on LinkedIn") [  ](https://x.com/intent/tweet?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-large-product-catalogs&text=How%20to%20Scrape%20an%20Entire%20Product%20Catalogue%20with%20Python "Share on X") [  ](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-large-product-catalogs "Share on Facebook")    

 
Summarize this article with

 [  ](https://chat.openai.com/?q=Summarize%20this%20article%20and%20explain%20how%20Scrapfly%20helps%20me%20scrape%20any%20website%20at%20scale%20and%20bypass%20anti-bot%20systems%20for%20my%20use%20case%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-large-product-catalogs) [  ](https://claude.ai/new?q=Summarize%20this%20article%20and%20explain%20how%20Scrapfly%20helps%20me%20scrape%20any%20website%20at%20scale%20and%20bypass%20anti-bot%20systems%20for%20my%20use%20case%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-large-product-catalogs) [  ](https://x.com/i/grok?text=Summarize%20this%20article%20and%20explain%20how%20Scrapfly%20helps%20me%20scrape%20any%20website%20at%20scale%20and%20bypass%20anti-bot%20systems%20for%20my%20use%20case%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-large-product-catalogs) [  ](https://www.perplexity.ai/search/new?q=Summarize%20this%20article%20and%20explain%20how%20Scrapfly%20helps%20me%20scrape%20any%20website%20at%20scale%20and%20bypass%20anti-bot%20systems%20for%20my%20use%20case%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-large-product-catalogs) [  ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20article%20and%20explain%20how%20Scrapfly%20helps%20me%20scrape%20any%20website%20at%20scale%20and%20bypass%20anti-bot%20systems%20for%20my%20use%20case%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-large-product-catalogs) 


For over a decade, people have asked: can you extract an entire product catalog from any e-commerce site? Most tutorials answer a smaller one: how to scrape a single page. The hard part isn't parsing one product, it's reaching all 50,000 without missing one.

In this guide, we'll build a Python workflow that finds every product, walks pagination and infinite scroll, extracts clean data, and scales reliably. Runnable code targets a public sandbox, with guides linked for protected retailers.

[How to Observe E-Commerce Trends using Web ScrapingIn this example web scraping project we'll be taking a look at monitoring E-Commerce trends using Python, web scraping and data visualization tools.](https://scrapfly.io/blog/posts/observing-ecommerce-market-trends-with-web-scraping)


## Key Takeaways

- **Coverage, not parsing, is the hard part** of full-catalog scraping.
- **Discover by reliability:** sitemaps and feeds, then categories, then search.
- **Paginate to exhaustion and dedup by product ID**, never a hardcoded page count.
- **Prefer hidden JSON or JSON-LD over CSS selectors**, which break on every layout change.
- **JavaScript storefronts need a rendering browser**, not a plain `requests` call.
- **At scale, catalogs block scrapers**: use managed anti-bot and pace your requests.
- **Keep catalogs fresh with incremental re-crawls**, not full re-scrapes every run.
- **Scrapfly handles rendering, coverage, anti-bot, and parsing** through its scraping APIs.

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.


## Why Scrape an Entire Product Catalog?

You scrape a full catalog when partial data is misleading. Price intelligence, assortment analysis, availability tracking, dataset building, and search systems all need complete coverage. A sample of page one won't do.

The same method applies to a marketplace, a brand store, or a niche retailer. That's why this guide stays site-agnostic. Discovery and traversal work the same whether the store sells sneakers, groceries, or industrial parts.

Site-specific quirks live in the dedicated guides, like [scraping Walmart](https://scrapfly.io/blog/posts/how-to-scrape-walmartcom) or [scraping Amazon](https://scrapfly.io/blog/posts/how-to-scrape-amazon).

One responsible-scraping note before any code. Collect public product data only, such as names, prices, and availability. Respect each site's robots rules, terms of service, and rate limits, and avoid personal or account-gated data.

For the wider business context, see Scrapfly's [e-commerce use-case page](https://scrapfly.io/use-case/ecommerce-web-scraping). With the why settled, the next step is understanding what makes full coverage hard.


## What Makes Scraping a Full Catalog Hard

Scraping one page is easy. Scraping a whole catalog is hard because four problems compound the moment you try to reach every product instead of one. Naming them up front turns the rest of this guide into the solution.

The four obstacles you'll face on almost any store are:

- **Coverage and discovery.** You can scrape a page, but how do you find every product? Catalogs hide items behind categories, filters, and search. One common method is to enumerate all categories and recurse into each.
- **Pagination and infinite scroll.** Listings span hundreds of pages. Some use `?page=`, others use infinite scroll or a load-more button. Naive crawlers grab page one and stop.
- **JavaScript rendering.** Many storefronts render the product grid client-side. Raw `requests` returns `<div id="app"></div>` with no products inside.
- **Anti-bot and scale.** Real catalogs use Cloudflare, DataDome, or Akamai and rate-limit aggressively. At thousands of requests, you get blocked.
- **Selector fragility.** CSS class names change and A/B tests break hardcoded selectors. You need defensive parsing or structure-agnostic extraction.

Each section below tackles one of these. We'll start with setup, then focus on discovery and coverage, where most catalog scrapes go wrong. First, the project setup.


## Project Setup

You need Python 3.8 or newer (3.10+ recommended) and a small set of packages. We'll use `httpx` for requests, `parsel` for HTML and JSON parsing, and the `scrapfly-sdk` for rendering, anti-bot, and managed crawling. Add `pandas` if you want a quick CSV export.

Install everything with one command:

bash```bash
pip install scrapfly-sdk parsel httpx pandas
```


The runnable examples target [web-scraping.dev/products](https://web-scraping.dev/products). The sandbox has category links, pagination, and product pages that embed JSON-LD. For protected retailers, the worked examples link to guides like [scraping eBay](https://scrapfly.io/blog/posts/how-to-scrape-ebay).

Initialize the Scrapfly client once and reuse it:

python```python
from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
result = client.scrape(ScrapeConfig(url="https://web-scraping.dev/products"))
print(result.status_code)  # 200
```


This confirms your key works and the target responds. You can grab a free key from the [Scrapfly dashboard](https://scrapfly.io/dashboard). With the tools in place, the real work begins with discovery.


## How to Discover Every Product in a Catalog

The difference between scraping a page and scraping a catalog is discovery. You need a reliable way to enumerate every product URL before you extract anything. There are three manual methods, in order of reliability, plus a managed option for scale.

### Method 1: Sitemaps, the most reliable source

Most stores publish an XML sitemap that lists every indexable URL, product pages included. The site maintains it for you, so it is the fastest path to a near-complete URL set. Fetch the sitemap, drop the namespace, and keep the product links.

python```python
import httpx
from parsel import Selector

def discover_from_sitemap(sitemap_url):
    xml = httpx.get(sitemap_url).text
    sel = Selector(xml)
    sel.remove_namespaces()
    urls = sel.css("url > loc::text").getall()
    return [u for u in urls if "/product/" in u]

product_urls = discover_from_sitemap("https://web-scraping.dev/sitemap.xml")
print(len(product_urls), product_urls[0])
# 28 https://web-scraping.dev/product/1
```


Large catalogs split their sitemap behind a sitemap index, so follow each child `<sitemap>` entry and merge the results. Sitemaps can also lag behind new products, so treat them as the backbone and verify the count once you have crawled.

### Method 2: Category enumeration, when no sitemap exists

When a sitemap is missing or thin, walk the category tree instead. Read every category from the navigation, then collect the product links inside each one. This is the recursive method readers have described for years: open the index, list the categories, and recurse into each.

python```python
def discover_from_categories(base, categories):
    urls = set()
    for category in categories:
        sel = Selector(httpx.get(f"{base}/products?category={category}").text)
        for href in sel.css("a[href*='/product/']::attr(href)").getall():
            urls.add(href.split("?")[0])
    return urls

urls = discover_from_categories(
    "https://web-scraping.dev",
    ["apparel", "consumables", "household"],
)
```


Discover the category list from the site's own navigation rather than hardcoding it. Categories overlap and rarely cover every product on their own, so dedup as you collect and lean on the count check below.

### Method 3: On-site search, the fallback

When categories are shallow or bury products behind filters, the search box is the last resort. Query broad terms or iterate the alphabet, then gather the result links. It is the least complete method because search caps how many results it returns, so use it to fill gaps rather than as your primary source.

### The managed option: Crawler API

Maintaining sitemap parsing, category recursion, and search fallbacks for every target becomes its own project. Scrapfly's [Crawler API](https://scrapfly.io/crawler-api) takes a seed URL, follows on-page links, and can pull sitemaps with `use_sitemaps=True`, so discovery and traversal run as one managed job. We return to it in the scaling section.


## How to Crawl Listings with Pagination and Infinite Scroll

Once you know where products live, you walk every listing page to exhaustion and then confirm you have the whole catalog. The mistake that wrecks coverage is stopping early. The loop must run until the listing genuinely ends, not until an arbitrary page number.

### Paginate to exhaustion

Increment the page parameter until a page returns no new products, and dedup by product ID so overlapping or repeated links never inflate the count.

python```python
import re
import httpx
from parsel import Selector

def crawl_listings(base):
    seen, page = {}, 1
    while True:
        sel = Selector(httpx.get(f"{base}/products?page={page}").text)
        links = sel.css("a[href*='/product/']::attr(href)").getall()
        page_ids = {
            re.search(r"/product/(\d+)", href).group(1): href.split("?")[0]
            for href in links
        }
        if not page_ids:        # an empty page means the listing ended
            break
        seen.update(page_ids)   # dedup by product ID
        page += 1
    return seen

products = crawl_listings("https://web-scraping.dev")
print(len(products))   # 25
```


The loop stops on its own when a page adds nothing new, with no hardcoded limit. Notice it returns 25 products, not the 28 the sitemap listed. The next subsection shows how to catch that gap.

### Always verify the count

A run that looks finished can still miss products. Cross-check the crawled set against the site's own total from the sitemap.

python```python
sitemap = set(discover_from_sitemap("https://web-scraping.dev/sitemap.xml"))
crawled = set(products.values())
missing = sitemap - crawled
print(len(crawled), "crawled,", len(sitemap), "in sitemap")
# 25 crawled, 28 in sitemap
print(sorted(int(u.rsplit("/", 1)[-1]) for u in missing))
# [26, 27, 28]
```


The listing pages surface 25 products, but the sitemap lists 28. Three products are reachable only through the sitemap or a direct link, a gap you would never catch without the cross-check. That is exactly why discovery should never rely on a single method.

### Infinite scroll and load-more

Many storefronts swap numbered pages for infinite scroll or a load-more button. Two approaches handle it. The cleaner one is to open your browser's network tab, find the JSON pagination endpoint the page calls as you scroll, and request it directly with a rising offset or cursor. When there is no clean endpoint to call, render the page and let it auto-scroll. Scrapfly does the second with `render_js=True` plus `auto_scroll=True`:

python```python
from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
result = client.scrape(ScrapeConfig(
    url="https://web-scraping.dev/products",
    render_js=True,
    auto_scroll=True,
))
links = result.selector.css("a[href*='/product/']::attr(href)").getall()
```


Rendering returns the fully loaded DOM, so the same link extraction works whether the listing paginates or scrolls.


## How to Extract Product Data Reliably

Durable extraction starts with the data source, not the selector. Prefer hidden JSON or a JSON API over CSS selectors, render JavaScript only when needed, and stay structure-agnostic across stores. That order keeps your parser alive through layout changes.

### Prefer structured JSON over CSS selectors

Most product pages embed their data as JSON-LD or in a JavaScript state blob such as `__NEXT_DATA__`. That structured data changes far less often than the visual markup, so parse it first. On web-scraping.dev, every product page carries a JSON-LD `Product` object.

python```python
import json
import httpx
from parsel import Selector

def extract_product(url):
    sel = Selector(httpx.get(url).text)
    raw = sel.css('script[type="application/ld+json"]::text').get()
    data = json.loads(raw)
    offers = data.get("offers", {})
    return {
        "name": data.get("name"),
        "price_low": offers.get("lowPrice") or offers.get("price"),
        "price_high": offers.get("highPrice"),
        "availability": offers.get("availability", "").rsplit("/", 1)[-1],
        "url": offers.get("url"),
    }

print(extract_product("https://web-scraping.dev/product/1"))
# {'name': 'Box of Chocolate Candy', 'price_low': '9.99', 'price_high': '19.99',
#  'availability': 'InStock', 'url': 'https://web-scraping.dev/product/1'}
```


JSON-LD gives you typed fields with no brittle class names. When a page embeds no JSON, fall back to CSS selectors, but expect them to break on the next redesign and guard every lookup against missing nodes.

### Stay structure-agnostic across many stores

Per-site selectors stop scaling the moment you scrape dozens of stores with different markup. This is where prompt and model based extraction earns its place. Scrapfly's [Extraction API](https://scrapfly.io/docs/extraction-api/getting-started) reads a page and returns the same normalized fields from any layout, using a prebuilt model or your own prompt.

python```python
from scrapfly import ScrapflyClient, ScrapeConfig, ExtractionConfig

client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
html = client.scrape(ScrapeConfig(url="https://web-scraping.dev/product/1")).content
result = client.extract(ExtractionConfig(
    body=html,
    content_type="text/html",
    extraction_model="product",   # or product_listing for a listing page
))
print(result.data)   # normalized name, price, currency, and availability
```


The `product` model returns the same fields from heterogeneous pages, so one call replaces a hand-written parser per store. With extraction handled, the last problem is doing all of this at catalog scale without getting blocked.


Scrapfly

#### Extract structured data automatically?

Scrapfly's Extraction API uses AI to turn any webpage into structured data — no selectors needed.

[Try Free →](https://scrapfly.io/register)## How to Scrape a Catalog at Scale Without Bans

At catalog scale you will hit anti-bot systems and rate limits, so reliability becomes about managing IP reputation, fingerprints, and pacing. A scraper that works for ten products often fails at ten thousand, and the failures follow predictable patterns.

The common symptoms and their fixes are:

- **Empty product grid.** JavaScript didn't render, or the wait fired too early. Render the page and wait for a product selector.
- **403 or sudden IP blocks.** Your proxy or fingerprint got flagged. Rotate residential IPs and turn on anti-bot bypass.
- **Pagination dies after a few pages.** Session cookies expired. Keep a consistent session across requests.

The levers that keep large runs alive are residential proxy rotation, anti-bot bypass, request pacing with concurrency limits, and retries. Scrapfly's Anti Scraping Protection (`asp=True`) handles the first two automatically:

python```python
result = client.scrape(ScrapeConfig(
    url="https://web-scraping.dev/product/1",
    asp=True,
    proxy_pool="public_residential_pool",
    country="us",
))
print(result.status_code)  # 200
```


With `asp=True`, Scrapfly auto-detects the protection vendor and applies the right bypass, while `proxy_pool` and `country` control IP reputation and geo-targeting.

Keep concurrency modest and add a small delay between requests to avoid tripping rate limits. For the detection theory behind these systems, see the [anti-bot bypass guide](https://scrapfly.io/blog/posts/how-to-bypass-anti-bot-protection-when-web-scraping).

[How to Rotate Proxies in Web ScrapingIn this article we explore proxy rotation. How does it affect web scraping success and blocking rates and how can we smartly distribute our traffic through a pool of proxies for the best results.](https://scrapfly.io/blog/posts/how-to-rotate-proxies-in-web-scraping)

Staying unblocked gets you one complete run. Keeping the catalog accurate over time is a separate job.


## How to Keep a Catalog Fresh with Incremental Re-Crawls

Re-scraping a whole catalog every day is wasteful and slow. Instead, track what changed and re-fetch only the products that moved. The trick is to store a small fingerprint per product and diff it against the previous run.

A fingerprint is a hash of the fields you care about, usually price and availability. On each re-crawl, compute the new fingerprint and compare it to the stored one. Only changed products need an update.

python```python
import hashlib

def fingerprint(product):
    blob = f"{product['price']}|{product['availability']}"
    return hashlib.sha1(blob.encode()).hexdigest()

def diff_catalog(previous, current):
    changed = []
    for url, product in current.items():
        if url not in previous or previous[url] != fingerprint(product):
            changed.append(url)
    return changed

prev = {"https://web-scraping.dev/product/1": fingerprint({"price": "9.99", "availability": "InStock"})}
now = {"https://web-scraping.dev/product/1": {"price": "8.99", "availability": "InStock"}}
print(diff_catalog(prev, now))  # ['https://web-scraping.dev/product/1']
```


The diff flags the one product whose price moved and skips the rest. You can cut work further by reading sitemap `lastmod` timestamps to skip products that haven't changed since your last run.

Schedule the job with cron or a queue, or pair it with the Crawler API for managed recurring coverage. With freshness handled, a few legal and practical questions remain.


## Is It Legal to Scrape Product Catalogs?

This section answers the legal and how-to questions that come up most often when scraping catalogs.

### Is it legal to scrape product data?

Public product data like names, prices, and availability is generally treated differently from personal or account-gated data. Respect each site's terms and robots rules, avoid personal data, and consult a lawyer for commercial use.

### Can I scrape an entire catalog with only requests and BeautifulSoup?

For small, static stores, yes. For JavaScript-rendered or protected catalogs at scale, you need rendering plus managed anti-bot, then you parse the JSON.

### How do I find every product on a site?

Start with sitemaps and product feeds, then category enumeration, then on-site search. Dedup by product ID and verify your count against the site's stated total.

### How do I scrape infinite-scroll catalogs?

Call the underlying JSON pagination endpoint directly when you can find it in DevTools, since it's cleaner than rendering. Otherwise render the page and auto-scroll.

### How do I avoid getting blocked when scraping thousands of pages?

Rotate residential proxies, turn on anti-bot bypass, pace your requests, and re-crawl incrementally instead of hammering every page each run.

### How do I export the catalog?

Normalize fields such as price and currency, then write to CSV, JSON, or a database with a `captured_at` timestamp for each record.


## Scraping Product Catalogs with Scrapfly

You can build every piece above yourself. At catalog scale the work shifts from parsing to keeping discovery, rendering, anti-bot, and extraction reliable across thousands of pages. Scrapfly collapses those concerns into a few managed APIs so you maintain data, not infrastructure.


ScrapFly's [Crawler API](https://scrapfly.io/crawler-api) recursively crawls entire domains, discovers links automatically, and collects structured data from thousands of pages without writing spider logic.

- [Recursive crawling](https://scrapfly.io/docs/crawler-api/getting-started) - configure depth, page limits, and concurrency to walk entire domains or scoped sections.
- [Pattern based filtering](https://scrapfly.io/docs/crawler-api/url-sources) - include and exclude URLs with wildcards to keep crawls on target.
- [Anti-bot bypass and JavaScript rendering](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) - every request benefits from the same stealth stack as the Web Scraping API, including React, Vue, and Angular SPAs.
- [Multiple output formats](https://scrapfly.io/docs/crawler-api/results) - get results as HTML, Markdown, text, JSON, WARC archives, or HAR files, in one crawl.
- [LLM powered extraction](https://scrapfly.io/docs/extraction-api/llm-prompt) - convert raw HTML into structured JSON with prompts or templates, no selectors required.
- [Polling or real time webhooks](https://scrapfly.io/docs/crawler-api/webhook) - pull results on completion or stream events as each page finishes.
- [Dashboard monitoring](https://scrapfly.io/docs/monitoring) - watch live progress, inspect per page state, and replay results from the UI.
- [Python](https://scrapfly.io/docs/sdk/python) and [TypeScript](https://scrapfly.io/docs/sdk/typescript) SDKs with built in WARC and HAR parsers.


### Web Scraping API

Scrape any website with our powerful API. Anti-bot bypass, JavaScript rendering, and rotating proxies built-in.


[Try Web Scraping API](https://scrapfly.io/docs/scrape-api/getting-started)


## FAQ

What's the difference between scraping a page and scraping a catalog?Scraping a page is pure parsing of one URL. Scraping a catalog is a coverage problem: you must first discover and reach every product URL before any parsing matters.


Why does my crawler only return the first page of products?It's hardcoded to stop early or it never detects the next page. Loop until a page returns no new products and dedup by product ID instead of using a fixed page count.


Should I use CSS selectors or JSON to extract products?Prefer JSON-LD, `__NEXT_DATA__`, or an internal JSON API when available, since they survive layout changes. Use CSS selectors only as a fallback for static pages with no embedded data.


How do I scrape catalogs from many different stores at once?Per-site selectors don't scale across stores, so use prompt-based extraction that reads any markup. Scrapfly's Extraction API returns the same fields from heterogeneous pages without custom parsers.


## Summary

Scraping an entire product catalog is a coverage problem before it's a parsing problem. The method is consistent across stores.

Discover every product URL through sitemaps, categories, or search, then paginate to exhaustion and dedup. From there, extract from JSON-LD or a hidden JSON API wherever you can.

The parts that fail at scale are rendering, anti-bot, and freshness. Render JavaScript pages and manage IP reputation with residential proxies and anti-bot bypass. Pace your requests, and re-crawl incrementally so you only touch products that changed.

Always verify your collected count against the site's stated total. A run that looks finished can still miss products, as our pagination example did.

This is the cross-vertical foundation. For one retailer, use the site-specific guides; for many sources at once, build on this single-catalog method.

Scrapfly's Web Scraping API handles rendering and anti-bot, and the Crawler API handles whole-catalog discovery. The Extraction API handles structure-agnostic parsing, so together they cover the parts that make full-catalog scraping fail.


Legal Disclaimer and PrecautionsThis tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect:

- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens protected by GDPR.
- Do not repurpose *entire* public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow. For more you should consult a lawyer.

 
   [  Add as a preferred source ](https://google.com/preferences/source?q=scrapfly.io) Table of Contents


  Table of Contents- [Key Takeaways](#key-takeaways)
- [Why Scrape an Entire Product Catalog?](#why-scrape-an-entire-product-catalog)
- [What Makes Scraping a Full Catalog Hard](#what-makes-scraping-a-full-catalog-hard)
- [Project Setup](#project-setup)
- [How to Discover Every Product in a Catalog](#how-to-discover-every-product-in-a-catalog)
- [Method 1: Sitemaps, the most reliable source](#method-1-sitemaps-the-most-reliable-source)
- [Method 2: Category enumeration, when no sitemap exists](#method-2-category-enumeration-when-no-sitemap-exists)
- [Method 3: On-site search, the fallback](#method-3-on-site-search-the-fallback)
- [The managed option: Crawler API](#the-managed-option-crawler-api)
- [How to Crawl Listings with Pagination and Infinite Scroll](#how-to-crawl-listings-with-pagination-and-infinite-scroll)
- [Paginate to exhaustion](#paginate-to-exhaustion)
- [Always verify the count](#always-verify-the-count)
- [Infinite scroll and load-more](#infinite-scroll-and-load-more)
- [How to Extract Product Data Reliably](#how-to-extract-product-data-reliably)
- [Prefer structured JSON over CSS selectors](#prefer-structured-json-over-css-selectors)
- [Stay structure-agnostic across many stores](#stay-structure-agnostic-across-many-stores)
- [How to Scrape a Catalog at Scale Without Bans](#how-to-scrape-a-catalog-at-scale-without-bans)
- [How to Keep a Catalog Fresh with Incremental Re-Crawls](#how-to-keep-a-catalog-fresh-with-incremental-re-crawls)
- [Is It Legal to Scrape Product Catalogs?](#is-it-legal-to-scrape-product-catalogs)
- [Is it legal to scrape product data?](#is-it-legal-to-scrape-product-data)
- [Can I scrape an entire catalog with only requests and BeautifulSoup?](#can-i-scrape-an-entire-catalog-with-only-requests-and-beautifulsoup)
- [How do I find every product on a site?](#how-do-i-find-every-product-on-a-site)
- [How do I scrape infinite-scroll catalogs?](#how-do-i-scrape-infinite-scroll-catalogs)
- [How do I avoid getting blocked when scraping thousands of pages?](#how-do-i-avoid-getting-blocked-when-scraping-thousands-of-pages)
- [How do I export the catalog?](#how-do-i-export-the-catalog)
- [Scraping Product Catalogs with Scrapfly](#scraping-product-catalogs-with-scrapfly)
- [FAQ](#faq)
- [Summary](#summary)
 
    Join the Newsletter  Get monthly web scraping insights 

 
Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 
 ## Related Articles

 [  

 python ecommerce 

### How to Observe E-Commerce Trends using Web Scraping

In this example web scraping project we'll be taking a look at monitoring E-Commerce trends using Python, web scraping a...

 
 ](https://scrapfly.io/blog/posts/observing-ecommerce-market-trends-with-web-scraping) [  

 python crawling 

### How to Find All URLs on a Domain

Learn how to efficiently find all URLs on a domain using Python and web crawling. Guide on how to crawl entire domain to...

 
 ](https://scrapfly.io/blog/posts/how-to-find-all-urls-on-a-domain) [     

 python ecommerce 

### How to Build a Product Scraper for Multiple Sites

Learn how to scrape product data from multiple e-commerce sites, normalise it into a canonical schema, and match product...

 
 ](https://scrapfly.io/blog/posts/how-to-build-a-product-data-pipeline-from-multiple-ecommerce-sites) 

  ## Related Questions

- [ Q How to find all links using BeautifulSoup and Python? ](https://scrapfly.io/blog/answers/how-to-find-all-links-using-beautifulsoup)
 
  
 Extract structured data with AI, **1,000 free credits** [Start Free](https://scrapfly.io/register)