     [Blog](https://scrapfly.io/blog)   /  [project](https://scrapfly.io/blog/tag/project)   /  [How to Build a Price Tracker Using Python](https://scrapfly.io/blog/posts/how-to-build-a-price-tracker-in-python)   # How to Build a Price Tracker Using Python

 by [Mazen Ramadan](https://scrapfly.io/blog/author/mazen) May 05, 2026 24 min read [\#project](https://scrapfly.io/blog/tag/project) [\#python](https://scrapfly.io/blog/tag/python) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-price-tracker-in-python "Share on LinkedIn")    

 

 

   

Prices on the web move quietly. A product sits at one price for weeks, drops 30 percent for six hours during a flash sale, and snaps back before anyone refreshing the page by hand notices the discount. The only reliable way to catch that window is to point a script at the page, log every price the script sees, and have the script wake you up the moment a number moves in the right direction.

This guide walks through every layer of a production-shaped price tracker. The article covers async scraping with [httpx](https://www.python-httpx.org/) and [parsel](https://parsel.readthedocs.io/), durable price history in [SQLite](https://docs.python.org/3/library/sqlite3.html), change detection with percentage thresholds, scheduled execution with [APScheduler](https://apscheduler.readthedocs.io/) and cron, and anti-bot handling for real e-commerce targets.

## Key Takeaways

A Python price tracker is a small pipeline that scrapes prices on a schedule, stores each snapshot, and detects drops against history. The five components below are the entire system.

- Scrape product listings with `httpx.AsyncClient` and `asyncio.as_completed()` for fast concurrent fetching across paginated product pages.
- Use parsel `Selector` with XPath or CSS selectors to pull product ID, name, price, and URL out of HTML.
- Store every scrape in SQLite with two tables (`products` and `prices`) so price history survives restarts and can be queried directly.
- Detect price drops by comparing the two most recent prices per product, calculating the percentage change, and filtering on a configurable threshold.
- Schedule the tracker with APScheduler for self-contained Python execution or with system cron for OS-level reliability that survives reboots.

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.







## What Does a Python Price Tracker Actually Do?

A Python price tracker is a script that repeatedly scrapes product prices from a website, stores each price snapshot, and detects when a product drops below a target price. The job is a loop, and each turn through the loop adds one row of history to a database the script can compare against the next time around.

The full pipeline is a five-step cycle:

This guide builds each step incrementally. The scraping layer uses async httpx for speed, the storage layer uses SQLite for durable history, the change-detection layer compares the two latest snapshots per product and logs any drops past the threshold, and the scheduling layer uses APScheduler or system cron to run the loop without manual triggers.

A few related topics fall outside the scope of this guide. Comparing the same product across Amazon, Walmart, and BestBuy is multi-vendor work that has its own architecture, covered in the dedicated guide below.

[How to Track Competitor Prices Using Web ScrapingIn this web scraping guide, we'll explain how to create a tool for tracking competitor prices using Python. It will scrape specific products from different providers, compare their prices and generate insights.](https://scrapfly.io/blog/posts/how-to-track-competitor-pricing-using-web-scraping)



## How Do You Set Up the Project?

The price tracker uses httpx for async HTTP requests, parsel for HTML parsing, APScheduler for automated runs, and Python's standard library (`sqlite3`) for durable storage. Three pip packages cover everything that does not ship with Python.

Make sure Python 3.10 or newer is installed before starting. Async features and `asyncio.as_completed()` behave consistently from 3.10 onward, and the type hints in the snippets below assume the same baseline.

Install the three external dependencies with a single pip command:

shell```shell
pip install httpx parsel apscheduler
```



The pip command above installs the only third-party libraries the tracker needs. Storage uses `sqlite3` from the standard library, and the same httpx client handles every outbound HTTP request the tracker makes.

A clean project structure makes the moving parts easy to navigate. The three files below are the complete tracker:

- `scraper.py` — async product scraper that returns a list of price snapshots.
- `db.py` — SQLite setup, snapshot writes, and change detection queries.
- `scheduler.py` — entry point that wires the pipeline together and runs it on a schedule.

The scraping target throughout the guide is `https://web-scraping.dev/products`, a sandbox built for scraping demonstrations with stable selectors and predictable HTML. Repointing the tracker at a real e-commerce site only requires changing the URL and selectors, and the [anti-bot section](#how-do-you-handle-anti-bot-protection-on-e-commerce-sites) later in the article covers what changes when the target fights back.



## How Do You Scrape Product Prices with Python?

Scraping prices takes two steps: fetch the product listing HTML with an HTTP client, then parse the price element from each product card on the page. The combination used here is `httpx.AsyncClient` for concurrent fetching and parsel `Selector` with XPath for precise extraction.

The web-scraping.dev product listing renders a paginated grid of product cards, with each card carrying a product name, price, link, and ID embedded in the URL. The scraper extracts those four fields per product and adds a scrape timestamp to anchor the snapshot in time.

### How Do You Parse Price Data from Product Pages?

Parsing turns raw HTML into a list of clean Python dicts that downstream code can store and compare. The parser below iterates over each `.row.product` card and pulls the four fields the rest of the pipeline needs.

python```python
from parsel import Selector
from datetime import datetime, timezone
from typing import List, Dict

def parse_products(html: str) -> List[Dict]:
    """Parse product cards from a web-scraping.dev listing page."""
    selector = Selector(html)
    data = []
    for product in selector.xpath("//div[@class='row product']"):
        link = product.xpath(".//div[contains(@class,'description')]/h3/a/@href").get()
        name = product.xpath(".//div[contains(@class,'description')]/h3/a/text()").get()
        price_text = product.xpath(".//div[@class='price']/text()").get() or "0"
        product_id = int(link.rsplit("/product/", 1)[-1])
        data.append({
            "product_id": product_id,
            "name": name.strip() if name else "",
            "url": link,
            "price": float(price_text.replace("$", "").strip()),
            "scraped_at": datetime.now(timezone.utc).isoformat(timespec="seconds"),
        })
    return data
```



The parser above accepts an HTML string, builds a parsel `Selector`, and iterates over every product card on the page. Each card yields a `product_id`, `name`, `url`, `price`, and ISO-8601 `scraped_at` timestamp, which is the minimum data shape the storage and change-detection layers need. The `replace("$", "")` step strips the currency symbol so the price is a clean float.

[Parsing HTML with XpathIntroduction to xpath in the context of web-scraping. How to extract data from HTML documents using xpath, best practices and available tools.](https://scrapfly.io/blog/posts/parsing-html-with-xpath)

For teams more comfortable with CSS selectors, the same fields can be extracted with parsel's `.css()` method using `div.row.product`, `div.description h3 a::text`, and similar selectors covered in the [CSS selector parsing guide](https://scrapfly.io/blog/posts/parsing-html-with-css).

### How Do You Scrape Multiple Products Concurrently?

A price tracker that walks pages one at a time wastes most of its runtime on network latency. Async scraping with `httpx.AsyncClient` and `asyncio.as_completed()` fetches every paginated listing in parallel, so the total runtime is bounded by the slowest single request rather than the sum of all requests.

The scraper below fetches the first listing page, then dispatches the remaining pages concurrently:

Python

ScrapFly

python```python
import asyncio
from httpx import AsyncClient
from typing import List, Dict

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/124.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
}

async def scrape_products(url: str, max_pages: int = 5) -> List[Dict]:
    """Fetch every listing page concurrently and return a flat list of products."""
    async with AsyncClient(headers=HEADERS, timeout=20) as client:
        first = await client.get(url)
        first.raise_for_status()
        products = parse_products(first.text)

        other_pages = [
            client.get(f"{url}?page={page}")
            for page in range(2, max_pages + 1)
        ]
        for coro in asyncio.as_completed(other_pages):
            response = await coro
            response.raise_for_status()
            products.extend(parse_products(response.text))

    print(f"Scraped {len(products)} products from {url}")
    return products
```





python```python
import asyncio
from scrapfly import ScrapeConfig, ScrapflyClient
from typing import List, Dict

scrapfly = ScrapflyClient(key="Your Scrapfly API key")

async def scrape_products(url: str, max_pages: int = 5) -> List[Dict]:
    """Fetch every listing page through Scrapfly with anti-bot bypass enabled."""
    first = await scrapfly.async_scrape(ScrapeConfig(url, asp=True, country="US"))
    products = parse_products(first.content)

    configs = [
        ScrapeConfig(f"{url}?page={page}", asp=True, country="US")
        for page in range(2, max_pages + 1)
    ]
    async for response in scrapfly.concurrent_scrape(configs):
        products.extend(parse_products(response.content))

    print(f"Scraped {len(products)} products from {url}")
    return products
```







The Python tab launches a single `AsyncClient`, fetches the first page synchronously to discover pagination shape, then schedules the remaining pages with `asyncio.as_completed()` so each response is handled the moment that response arrives. The Scrapfly tab swaps the HTTP client for `ScrapflyClient` and uses `concurrent_scrape()`, which routes every request through the [Scrapfly ASP](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) anti-bot layer without changing the parser.

A run against web-scraping.dev returns a list of dicts shaped like:

json```json
[
  {
    "product_id": 1,
    "name": "Box of Chocolate Candy",
    "url": "https://web-scraping.dev/product/1",
    "price": 24.99,
    "scraped_at": "2026-04-30T09:01:00+00:00"
  },
  {
    "product_id": 2,
    "name": "Dark Red Energy Potion",
    "url": "https://web-scraping.dev/product/2",
    "price": 4.99,
    "scraped_at": "2026-04-30T09:01:00+00:00"
  }
]
```



[Web Scraping Speed: Processes, Threads and AsyncScaling web scrapers can be difficult - in this article we'll go over the core principles like subprocesses, threads and asyncio and how all of that can be used to speed up web scrapers dozens to hundreds of times.](https://scrapfly.io/blog/posts/web-scraping-speed)

With the scraper returning structured snapshots, the next layer turns that flat list into queryable history.



## How Do You Detect and Track Price Changes Over Time?

A price tracker without history is a scraper. To detect actual price changes, the tracker has to store every snapshot, then compare the latest snapshot against the previous one for each product. SQLite handles this job with zero infrastructure since the database is a single file, the driver ships with Python, and queries return clean rows that change-detection code can iterate over.

### How Do You Store Price History?

The schema is two tables. The `products` table holds metadata that does not change between scrapes (name, URL), and the `prices` table holds one row per scrape per product (price plus a timestamp). Splitting the two avoids re-storing the name on every run and keeps the price history table narrow and append-only.

python```python
import sqlite3
from typing import List, Dict

DB_PATH = "tracker.db"

def init_db() -> None:
    """Create the products and prices tables if missing."""
    with sqlite3.connect(DB_PATH) as conn:
        conn.executescript("""
        CREATE TABLE IF NOT EXISTS products (
            product_id INTEGER PRIMARY KEY,
            name       TEXT NOT NULL,
            url        TEXT NOT NULL
        );
        CREATE TABLE IF NOT EXISTS prices (
            id          INTEGER PRIMARY KEY AUTOINCREMENT,
            product_id  INTEGER NOT NULL REFERENCES products(product_id),
            price       REAL NOT NULL,
            scraped_at  TEXT NOT NULL
        );
        CREATE INDEX IF NOT EXISTS idx_prices_product
            ON prices(product_id, scraped_at DESC);
        """)

def save_prices(products: List[Dict]) -> None:
    """Upsert product metadata and append a fresh price row per product."""
    with sqlite3.connect(DB_PATH) as conn:
        for p in products:
            conn.execute(
                "INSERT INTO products(product_id, name, url) VALUES(?, ?, ?) "
                "ON CONFLICT(product_id) DO UPDATE SET name=excluded.name, url=excluded.url",
                (p["product_id"], p["name"], p["url"]),
            )
            conn.execute(
                "INSERT INTO prices(product_id, price, scraped_at) VALUES(?, ?, ?)",
                (p["product_id"], p["price"], p["scraped_at"]),
            )
```



The setup script above creates both tables on the first run and is safe to call on every subsequent run thanks to `IF NOT EXISTS`. The `idx_prices_product` index makes the "latest two prices per product" query in the next section cheap even after thousands of scrapes.

A quick smoke test against a sample snapshot confirms the schema and write path work before plugging in the scraper:

python```python
if __name__ == "__main__":
    init_db()
    sample = [
        {
            "product_id": 1,
            "name": "Box of Chocolate Candy",
            "url": "https://web-scraping.dev/product/1",
            "price": 24.99,
            "scraped_at": "2026-04-30T09:00:00+00:00",
        }
    ]
    save_prices(sample)
    print(f"Saved {len(sample)} snapshot(s) to {DB_PATH}")
```



The script above creates `tracker.db` next to the script, inserts one product row, and appends one price row. Re-running the script appends a second price row for the same product without duplicating the metadata, which is the same shape every real scrape will follow.

SQLite is the right default because SQLite is queryable, transactional, and resilient under repeated scheduled writes. CSV and JSON files work for one-off scrapes but degrade quickly under concurrent reads or partial writes. The schema also ports cleanly to PostgreSQL or MySQL for teams that outgrow a single-file database, with no change to the query code.

### How Do You Calculate Price Changes?

Detecting a price drop is a two-row comparison: pull the two most recent prices for each product, compute the percentage change between them, and emit any product whose price fell below a configurable threshold.

python```python
import sqlite3
from typing import List, Dict

def detect_changes(threshold_pct: float = -5.0) -> List[Dict]:
    """Return products whose latest price dropped by at least `threshold_pct` percent."""
    drops = []
    with sqlite3.connect(DB_PATH) as conn:
        conn.row_factory = sqlite3.Row
        products = conn.execute("SELECT product_id, name, url FROM products").fetchall()
        for product in products:
            rows = conn.execute(
                "SELECT price FROM prices WHERE product_id = ? "
                "ORDER BY scraped_at DESC LIMIT 2",
                (product["product_id"],),
            ).fetchall()
            if len(rows) < 2:
                continue
            new_price, old_price = rows[0]["price"], rows[1]["price"]
            change_pct = (new_price - old_price) / old_price * 100
            if change_pct <= threshold_pct:
                drops.append({
                    "product_id": product["product_id"],
                    "name": product["name"],
                    "url": product["url"],
                    "old_price": old_price,
                    "new_price": new_price,
                    "change_pct": round(change_pct, 2),
                })
    return drops
```



`detect_changes()` compares the two most recent prices per product and flags any drop past the threshold (`-5.0` by default, `-10.0` for deeper discounts only). Products with fewer than two snapshots are skipped.

Wiring the storage and detection layers together with the scraper from earlier produces the full read-write loop the tracker depends on. The script below runs the loop once end-to-end and prints any drops it finds:

python```python
import asyncio
from scraper import scrape_products

if __name__ == "__main__":
    init_db()
    products = asyncio.run(scrape_products("https://web-scraping.dev/products"))
    save_prices(products)
    for drop in detect_changes(threshold_pct=-5.0):
        print(
            f"{drop['name']}: ${drop['old_price']:.2f} -> "
            f"${drop['new_price']:.2f} ({drop['change_pct']}%)"
        )
```



The script above initializes the database on first boot, scrapes the current product listing, appends a fresh price row per product, and then prints every product whose latest price dropped by 5 percent or more. Running the script twice in a row with simulated price edits between runs prints lines like:

text```text
Scraped 25 products from https://web-scraping.dev/products
Box of Chocolate Candy: $24.99 -> $19.99 (-20.0%)
Teal Energy Potion: $4.99 -> $3.99 (-20.0%)
Blue Energy Potion: $4.99 -> $4.49 (-10.0%)
```



The same run against a database with mixed price changes returns a list shaped like the table below:

| Product | Previous Price | Current Price | Change % |
|---|---|---|---|
| Box of Chocolate Candy | $24.99 | $19.99 | -20.0% |
| Dark Red Energy Potion | $4.99 | $4.99 | 0.0% |
| Teal Energy Potion | $4.99 | $3.99 | -20.0% |
| Red Energy Potion | $4.99 | $5.49 | +10.0% |
| Blue Energy Potion | $4.99 | $4.49 | -10.0% |

With change detection emitting a clean list of drops, the next layer keeps the pipeline running on its own without anyone executing the script by hand.



Scrapfly

#### Scale your web scraping effortlessly

Scrapfly handles proxies, browsers, and anti-bot bypass — so you can focus on data.

[Try Free →](https://scrapfly.io/register)## How Do You Schedule the Price Tracker to Run Automatically?

A price tracker that only runs when you remember to execute the script is not really a tracker. APScheduler gives the script a self-contained Python scheduler that runs the pipeline at fixed intervals, while system cron runs the script independently of any long-lived Python process and survives reboots cleanly.

### How Do You Use APScheduler for Interval-Based Tracking?

APScheduler's `BackgroundScheduler` plus `IntervalTrigger` runs the tracker every N hours from inside one Python process. The full pipeline (scrape → store → detect → log) lives in a single `run_tracker()` function that the scheduler calls on its trigger.

python```python
import asyncio
import logging
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.interval import IntervalTrigger

from scraper import scrape_products
from db import init_db, save_prices, detect_changes

URL = "https://web-scraping.dev/products"
log = logging.getLogger("price_tracker")

def run_tracker() -> None:
    """One full pass: scrape, store, log any detected drops."""
    products = asyncio.run(scrape_products(URL))
    save_prices(products)
    for drop in detect_changes(threshold_pct=-5.0):
        log.info(
            "Drop: %s %.2f -> %.2f (%.2f%%) %s",
            drop["name"], drop["old_price"], drop["new_price"],
            drop["change_pct"], drop["url"],
        )

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
    init_db()
    run_tracker()  # run once on boot, then schedule
    scheduler = BlockingScheduler(timezone="UTC")
    scheduler.add_job(run_tracker, IntervalTrigger(hours=6), id="price_tracker")
    log.info("Scheduler started — running every 6 hours")
    scheduler.start()
```



The script above initializes the database, runs the tracker once immediately so the first set of snapshots lands without waiting six hours, then starts a `BlockingScheduler` that re-runs `run_tracker()` every six hours. Switching to a different cadence is a one-argument change on `IntervalTrigger`, and `CronTrigger(hour="0,6,12,18")` would pin the job to specific hours of the day instead of relative intervals.

### How Do You Use Cron Jobs for Server-Based Scheduling?

System cron runs a one-shot script on a schedule, which means the Python process exits between runs and the OS handles restart. Cron is the right pick for server deployments where process supervision and reboot survival matter more than keeping state in memory between runs.

The `if __name__` block stays identical to the APScheduler script, minus the scheduler itself:

python```python
import asyncio
import logging
from scraper import scrape_products
from db import init_db, save_prices, detect_changes

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
    init_db()
    products = asyncio.run(scrape_products("https://web-scraping.dev/products"))
    save_prices(products)
    for drop in detect_changes(threshold_pct=-5.0):
        logging.info(
            "Drop: %s %.2f -> %.2f (%.2f%%) %s",
            drop["name"], drop["old_price"], drop["new_price"],
            drop["change_pct"], drop["url"],
        )
```



The script above runs one full cycle and exits, which is exactly the shape cron expects. Add the cron entry below to a user crontab with `crontab -e` to run the tracker every six hours:

cron```cron
0 */6 * * * /usr/bin/python3 /path/to/tracker.py >> /path/to/tracker.log 2>&1
```



The cron line above triggers the script at minute 0 of every sixth hour and appends both stdout and stderr to a log file for later inspection. Windows users can wire the same one-shot script into Task Scheduler, and GitHub Actions can run it on a `schedule:` cron without provisioning any infrastructure, though the GitHub Actions path adds 1–2 minutes of cold start per run.

[Web Scraping Speed: Processes, Threads and AsyncScaling web scrapers can be difficult - in this article we'll go over the core principles like subprocesses, threads and asyncio and how all of that can be used to speed up web scrapers dozens to hundreds of times.](https://scrapfly.io/blog/posts/web-scraping-speed)

The tradeoff is straightforward. APScheduler keeps the entire system in one Python process, which is simpler to debug and easier to deploy to a single VM. Cron pushes scheduling to the OS, which means the script restarts cleanly after crashes and reboots without any supervisor.



## How Do You Handle Anti-Bot Protection on E-Commerce Sites?

Real e-commerce sites use anti-bot systems like [Cloudflare](https://scrapfly.io/blog/posts/how-to-bypass-cloudflare-anti-scraping), [Akamai](https://scrapfly.io/blog/posts/how-to-bypass-akamai-anti-scraping), [PerimeterX](https://scrapfly.io/blog/posts/how-to-bypass-perimeterx-human-anti-scraping), and [DataDome](https://scrapfly.io/blog/posts/how-to-bypass-datadome-anti-scraping) that block plain HTTP requests within seconds. The tracker built so far works against `web-scraping.dev` because web-scraping.dev is engineered for scraping, but pointing the same code at Amazon, Walmart, or BestBuy will trigger JavaScript challenges, CAPTCHAs, and TLS fingerprint checks that httpx cannot answer on its own.

The defenses fall into a few categories, and each defense raises the engineering cost of a self-managed scraper:

- **Header and TLS fingerprinting** — anti-bot systems profile the JA3 and JA4 fingerprints of the TLS handshake and reject clients whose fingerprint does not match a real browser.
- **JavaScript challenges and CAPTCHAs** — Cloudflare, DataDome, and PerimeterX serve interstitial pages that require running JavaScript and sometimes solving a CAPTCHA before granting access.
- **Rate limiting and IP reputation** — datacenter IPs ship with low reputation by default, and aggressive request rates trigger per-IP throttling or outright bans.
- **Behavioral signals** — mouse movement, scroll velocity, and request timing all feed into bot scores that determine whether a session is allowed through.

The tradeoff between self-managed and managed approaches looks like this:

| Approach | Pros | Cons | Best for |
|---|---|---|---|
| Headers + User-Agent rotation | Free, no extra deps | Blocked by anything beyond basic checks | Low-protection sites, prototyping |
| Self-managed proxies + headless browser | Full control over every request | High ops overhead, per-site tuning, fragile to anti-bot updates | Teams with DevOps capacity |
| Managed scraping API (Scrapfly ASP) | Adapts to any protection automatically | Per-request cost | Production tracking at scale |

A baseline self-managed scraper with retries and User-Agent rotation buys headroom on lighter sites:

python```python
import asyncio
import random
from httpx import AsyncClient, HTTPStatusError, RequestError
from typing import List, Dict

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0",
]

async def fetch_with_retry(client: AsyncClient, url: str, attempts: int = 3) -> str:
    """Fetch a URL with rotating User-Agent and exponential backoff on failure."""
    for attempt in range(1, attempts + 1):
        try:
            response = await client.get(url, headers={"User-Agent": random.choice(USER_AGENTS)})
            if response.status_code in (403, 429):
                raise HTTPStatusError("blocked", request=response.request, response=response)
            response.raise_for_status()
            return response.text
        except (HTTPStatusError, RequestError) as exc:
            if attempt == attempts:
                raise
            wait = 2 ** attempt + random.random()
            print(f"Attempt {attempt} failed for {url}: {exc}. Retrying in {wait:.1f}s")
            await asyncio.sleep(wait)
```



The fetcher above rotates User-Agent strings on every request, retries 403 and 429 responses with exponential backoff plus jitter, and raises after the configured number of attempts. The pattern is enough for low-protection sites and short scraping windows, but the pattern does not address TLS fingerprinting, JavaScript challenges, or behavioral signals.

Wiring `fetch_with_retry()` into a real call is straightforward with `asyncio.run()` and one `AsyncClient` for the duration of the run:

python```python
async def main():
    async with AsyncClient(timeout=20) as client:
        html = await fetch_with_retry(client, "https://web-scraping.dev/products")
        print(f"Fetched {len(html)} bytes from web-scraping.dev")

if __name__ == "__main__":
    asyncio.run(main())
```



The runner above opens a single `AsyncClient`, calls `fetch_with_retry()` once, and prints the response size on success. Swapping the URL for a 403-prone target exercises the retry path.

For a few products on a low-protection site, self-managed approaches are fine. At scale or against aggressive anti-bot systems, managed APIs save days of per-site reverse engineering and cut the maintenance burden when target sites push anti-bot updates.



## How Does Scrapfly Simplify Price Tracking at Scale?

Scrapfly's [Anti Scraping Protection (ASP)](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) handles anti-bot bypass automatically, and the [concurrent\_scrape()](https://scrapfly.io/docs/sdk/python#concurrency) API lets the tracker fetch hundreds of product pages in parallel.

ScrapFly's [Web Scraping API](https://scrapfly.io/web-scraping-api) is a single HTTP endpoint for collecting web data at scale, with a **99.99% success rate** across **130M+ proxies in 120+ countries**.

- [Anti-Scraping Protection bypass](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) - automatically defeats Cloudflare, DataDome, PerimeterX, Akamai, and 90+ other bot systems.
- [Smart proxy rotation](https://scrapfly.io/docs/scrape-api/proxy) - residential and datacenter pools with country and ASN level geo-targeting.
- [JavaScript rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering) - render SPAs and dynamic pages through real cloud browsers.
- [Browser automation scenarios](https://scrapfly.io/docs/scrape-api/javascript-scenario) - scroll, click, fill forms, and wait for elements without managing a browser fleet.
- [Format conversion](https://scrapfly.io/docs/scrape-api/getting-started#api_param_format) - return pages as HTML, JSON, clean text, or LLM ready Markdown.
- [Session management](https://scrapfly.io/docs/scrape-api/session) - keep cookies, headers, and IPs consistent across multi step flows.
- [Smart caching](https://scrapfly.io/docs/scrape-api/getting-started#api_param_cache) - cache successful responses to cut cost on repeat scraping jobs.
- [Python](https://scrapfly.io/docs/sdk/python), [TypeScript](https://scrapfly.io/docs/sdk/typescript), [Scrapy](https://scrapfly.io/docs/sdk/scrapy), and [no-code integrations](https://scrapfly.io/docs/integration/getting-started) including Make, n8n, Zapier, LangChain, and LlamaIndex.

The Scrapfly version of the tracker swaps `httpx.AsyncClient` for `ScrapflyClient` and keeps every other layer (storage, change detection, scheduling) untouched:

python```python
import asyncio
from scrapfly import ScrapeConfig, ScrapflyClient
from typing import List, Dict

scrapfly = ScrapflyClient(key="Your Scrapfly API key")

async def scrape_products(url: str, max_pages: int = 5) -> List[Dict]:
    """Track prices through Scrapfly with anti-bot bypass and concurrent fetching."""
    first = await scrapfly.async_scrape(ScrapeConfig(url, asp=True, country="US"))
    products = parse_products(first.content)

    configs = [
        ScrapeConfig(f"{url}?page={page}", asp=True, country="US", render_js=True)
        for page in range(2, max_pages + 1)
    ]
    async for response in scrapfly.concurrent_scrape(configs):
        products.extend(parse_products(response.content))

    return products
```



The Scrapfly version above enables `asp=True` so anti-bot challenges are handled transparently, sets `country="US"` to pin the proxy geography, and turns on `render_js=True` for sites that load product data via client-side JavaScript. The `parse_products()` function from earlier in the article continues to work unchanged because Scrapfly returns parsel-compatible HTML.

Running the Scrapfly scraper end-to-end is the same one-liner as the httpx version, since both expose the same `scrape_products()` signature:

python```python
if __name__ == "__main__":
    products = asyncio.run(scrape_products("https://web-scraping.dev/products"))
    print(f"First product: {products[0]['name']} at ${products[0]['price']}")
```



The runner above invokes the async Scrapfly scraper, prints the first product's name and price to confirm parsing succeeded, and returns immediately. Plugging the same `products` list into `save_prices()` and `detect_changes()` from the storage section turns this into the full Scrapfly-backed tracker without any other changes.

For a few products on low-protection sites, plain httpx works fine. Scrapfly becomes valuable when the watchlist grows past a handful of products on aggressive anti-bot targets, when keeping a self-managed proxy pool stops being worth the maintenance cost, or when running the tracker on serverless schedulers that cannot keep long-lived browser sessions warm.

[Competitor Price Monitoring with Crawler APIBuild an automated competitor price monitoring system using Scrapfly Crawler API. Track thousands of products, handle anti-bot protection, and integrate with repricing tools.](https://scrapfly.io/blog/posts/competitor-price-monitoring-with-crawler-api)



## FAQ

Can You Track Prices on Amazon with This Python Tracker? Yes, but Amazon uses aggressive anti-bot detection that blocks plain httpx requests within seconds. Real Amazon tracking requires residential proxies, browser automation, or a managed scraping API like Scrapfly with ASP enabled. The tracker logic (storage, change detection, scheduling) is identical.







Is It Legal to Scrape Prices from E-Commerce Websites?Scraping publicly visible product pages is often permissible, but legality depends on jurisdiction, the target site's terms of service, the access method, and the intended use of the data. Avoid scraping behind a login, do not collect personal data, and consult a lawyer for commercial-scale projects in regulated jurisdictions.







How Often Should a Price Tracker Check for Changes? A six-to-twelve-hour cadence covers most e-commerce products, because retail prices rarely change more than a few times per day. Flash sales and limited-time promotions justify hourly checks for narrow watchlists, but more frequent polling raises detection risk and increases the tracker's operational cost.







What Is the Best Python Library for Scraping Prices?For most price tracking projects, httpx (HTTP) plus parsel (HTML parsing) is a fast, modern combination that handles async natively and stays out of the way. BeautifulSoup with Requests is the most popular alternative and works well for synchronous projects.









## Conclusion

In this guide, we built a Python price tracker from scratch. We started by defining what a tracker actually does beyond a one-shot scraper, then went through a step-by-step tutorial on building one using Python by:

- Scraping product listing pages with [httpx](https://www.python-httpx.org/) and [parsel](https://github.com/scrapy/parsel/) for fast async fetching and HTML parsing.
- Persisting price history in SQLite with two tables to support change detection over time.
- Detecting meaningful drops with a percentage threshold against the previous snapshot.
- Scheduling the tracker to run unattended with APScheduler or a system cron job.
- Hardening the fetch layer against Cloudflare, Akamai, PerimeterX, and DataDome for real e-commerce targets.

Each layer is independent enough to swap out — replace SQLite with PostgreSQL, bolt a Discord or email notifier onto the drop detection, move from cron to GitHub Actions — without touching the rest of the pipeline. For production targets behind anti-bot, [Scrapfly\\'s web scraping API](https://scrapfly.io) handles the fetch layer in one API call while the tracking logic stays the same.



Legal Disclaimer and PrecautionsThis tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect:

- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens protected by GDPR.
- Do not repurpose *entire* public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow. For more you should consult a lawyer.

 

   Table of Contents















 

  Table of Contents- [Key Takeaways](#key-takeaways)
- [What Does a Python Price Tracker Actually Do?](#what-does-a-python-price-tracker-actually-do)
- [How Do You Set Up the Project?](#how-do-you-set-up-the-project)
- [How Do You Scrape Product Prices with Python?](#how-do-you-scrape-product-prices-with-python)
- [How Do You Parse Price Data from Product Pages?](#how-do-you-parse-price-data-from-product-pages)
- [How Do You Scrape Multiple Products Concurrently?](#how-do-you-scrape-multiple-products-concurrently)
- [How Do You Detect and Track Price Changes Over Time?](#how-do-you-detect-and-track-price-changes-over-time)
- [How Do You Store Price History?](#how-do-you-store-price-history)
- [How Do You Calculate Price Changes?](#how-do-you-calculate-price-changes)
- [How Do You Schedule the Price Tracker to Run Automatically?](#how-do-you-schedule-the-price-tracker-to-run-automatically)
- [How Do You Use APScheduler for Interval-Based Tracking?](#how-do-you-use-apscheduler-for-interval-based-tracking)
- [How Do You Use Cron Jobs for Server-Based Scheduling?](#how-do-you-use-cron-jobs-for-server-based-scheduling)
- [How Do You Handle Anti-Bot Protection on E-Commerce Sites?](#how-do-you-handle-anti-bot-protection-on-e-commerce-sites)
- [How Does Scrapfly Simplify Price Tracking at Scale?](#how-does-scrapfly-simplify-price-tracking-at-scale)
- [FAQ](#faq)
- [Conclusion](#conclusion)
 
    Join the Newsletter  Get monthly web scraping insights 

 

  



Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 

## Explore this Article with AI

 [ ChatGPT ](https://chat.openai.com/?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-price-tracker-in-python) [ Gemini ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-price-tracker-in-python) [ Grok ](https://x.com/i/grok?text=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-price-tracker-in-python) [ Perplexity ](https://www.perplexity.ai/search/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-price-tracker-in-python) [ Claude ](https://claude.ai/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-price-tracker-in-python) 



 ## Related Articles

 [  

 python httpx 

### How to Web Scrape with HTTPX and Python

Intro to using Python's httpx library for web scraping. Proxy and user agent rotation and common web scraping challenges...

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-python-httpx) [  

 data-parsing parsel 

### Guide to Parsel - the Best HTML Parsing in Python

Learn to extract data from websites with Parsel, a Python library for HTML parsing using CSS selectors and XPath.

 

 ](https://scrapfly.io/blog/posts/guide-to-html-parsing-with-parsel-python) [  

 http python 

### Web Scraping with Python

Introduction tutorial to web scraping with Python. How to collect and parse public data. Challenges, best practices and ...

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-python) 

  ## Related Questions

- [ Q How to use proxies with Python httpx? ](https://scrapfly.io/blog/answers/how-to-use-proxies-python-httpx)
 
  



   



 Scale your web scraping effortlessly, **1,000 free credits** [Start Free](https://scrapfly.io/register)