How to Rate Limit Async Requests in Python

article feature image

When web scraping, we're often limited by the website's technical capabilities. We can't scrape too fast without being blocked or overwhelming smaller websites.

Asynchronous HTTP clients like httpx allow us to easily make hundreds of requests per second but don't provide ways to fine-tune our scraping speed. So, in this quick tutorial, we'll be taking a look at how to rate-limit asynchronous HTTP connections to slow down our scrapers.

Python httpx

HTTPX is the most popular asynchronous HTTP client in Python which can be installed using pip install terminal command:

$ pip install httpx

HTTPX supports both synchronous and asynchronous HTTP clients. It's unlikely that we need to throttle synchronous connections as they are very slow to begin so let's take a look at the throttling options we have for the asynchronous client.

To limit httpx client we can use the httpx.Limit object:

import httpx
session = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=5  # we can change max connection count here
    )
)

However limiting scraping by connection count is very inaccurate - on slower websites 5 connections might only manage a few requests per minute while on fast ones it could reach hundreds of requests per minute.

To limit httpx-powered scrapers we need an additional layer that tracks requests themselves rather than connections. Let's take a look at the most popular throttling library - aiometer.

Python aiometer

The most popular way to limit all asynchronous tasks in Python is aiometer which can be installed using the pip install terminal command:

$ pip install aiometer

Then we can schedule all of our requests to run through aiometer limiter:

import asyncio
from time import time

import aiometer
import httpx

session = httpx.AsyncClient()


async def scrape(url):
    response = await session.get(url)
    return response


async def run():
    _start = time()
    urls = ["http://httpbin.org/html" for i in range(10)]
    results = await aiometer.run_on_each(
        scrape, 
        urls,
        max_per_second=1,  # here we can set max rate per second
    )
    print(f"finished {len(urls)} requests in {time() - _start:.2f} seconds")
    return results


if __name__ == "__main__":
    asyncio.run(run())

# will print:
# finished 10 requests in 9.54 seconds

In our small example scraper we used aiometer.run_on_each function to limit 10 scrape requests at 1 request per second. With this one command, we can throttle our scraper to exact requests per second speed!

Scraping Faster Without Rate Limiting?

Some websites impose extremely low-speed limits for web scrapers making data collection impossible. To handle this web scraping APIs like ScrapFly can be used to scrape faster and avoid blocking.

ScrapFly offers dozens of different features that can help with scraper scaling, like:

To use ScrapFly in Python install ScrapFly-sdk package using the pip install scrapfly-sdk terminal command. Then targets can be scraped without the blocking:

from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient(
    key="YOUR SCRAPFLY KEY",
    max_concurrency=10,  # we can limit concurrent requests if needed
)
result = client.scrape(ScrapeConfig(
    url="http://httpbin.org/ip",
    # optional features like:
    # - we can select specific proxy country
    country="GB",
    # - and enable anti scraping protection bypass:
    asp=True,
    # see https://scrapfly.io/docs/scrape-api/getting-started for more
))

Related Posts

Playwright Examples for Web Scraping and Automation

Learn Playwright with Python and JavaScript examples for automating browsers like Chromium, WebKit, and Firefox.

How to use wget in Python

Learn how to use wget in Python through subprocess calls and what are other options.

Ultimate Guide to JSON Parsing in Python

Learn JSON parsing in Python with this ultimate guide. Explore basic and advanced techniques using json, and tools like ijson and nested-lookup