What is Asynchronous Web Scraping?

Asynchronous web scraping is a programming technique that allows for running multiple scrape tasks in effective parallel.

Asynchronous programming is especially important in web scraping as web scraping programs have a lot of waiting time. In other words, every time a web scraper requests a web page, it has to wait for the response. This waiting time can be relatively long, especially when scraping large amounts of web pages.

For example, let's take a look at this synchronous scraping example in Python:

import httpx
from time import time

_start = time()
pages = [
    "https://httpbin.dev/delay/2",
    "https://httpbin.dev/delay/2",
    "https://httpbin.dev/delay/2",
    "https://httpbin.dev/delay/2",
    "https://httpbin.dev/delay/2",
]
for page in pages:
    httpx.get(page)
print(f"finished scraping {len(pages)} pages in {time() - _start:.2f} seconds")
"finished scraping 5 pages in 15.46 seconds"

Here we have a list of 5 web pages that load in 2 seconds each. If we run this code, we'll see that it completes in ~15 seconds every time.

This is because our code waits for each page to fully complete scraping before moving on even if the program itself does nothing but wait for the server to respond.

In contrast, asynchronous web scraping allows for running multiple scrape tasks in effective parallel:

import httpx
import asyncio
from time import time

async def run():
    _start = time()
    async with httpx.AsyncClient() as client:
        pages = [
            "https://httpbin.dev/delay/2",
            "https://httpbin.dev/delay/2",
            "https://httpbin.dev/delay/2",
            "https://httpbin.dev/delay/2",
            "https://httpbin.dev/delay/2",
        ]
        # run all requests concurrently using asyncio.gather
        await asyncio.gather(*[client.get(page) for page in pages])
    print(f"finished scraping {len(pages)} pages in {time() - _start:.2f} seconds")

asyncio.run(run())
"finished scraping 5 pages in 2.93 seconds"

This Python example uses httpx.AsyncClient and asyncio to eliminate the waiting time by running all requests in parallel. As a result, the code completes in 2-3 seconds every time.


Asynchronous programming is an ideal fit for web scraping and one of the easiest ways to speed up web scraping. For more see:

Web Scraping Speed: Processes, Threads and Async

Introduction to scaling up web scrapers using asyncio, multi processing and multi threading. What all of these different technologies mean in web scraping.

Web Scraping Speed: Processes, Threads and Async
Question tagged: HTTP

Related Posts

Sending HTTP Requests With Curlie: A better cURL

In this guide, we'll explore Curlie, a better cURL version. We'll start by defining what Curlie is and how it compares to cURL. We'll also go over a step-by-step guide on using and configuring Curlie to send HTTP requests.

How to Use cURL For Web Scraping

In this article, we'll go over a step-by-step guide on sending and configuring HTTP requests with cURL. We'll also explore advanced usages of cURL for web scraping, such as scraping dynamic pages and avoiding getting blocked.

Use Curl Impersonate to scrape as Chrome or Firefox

Learn how to prevent TLS fingerprinting by impersonating normal web browser configurations. We'll start by explaining what the Curl Impersonate is, how it works, how to install and use it. Finally, we'll explore using it with Python to avoid web scraping blocking.