What are scrapy middlewares and how to use them?

by scrapecrow May 02, 2023

Scrapy middlewares are Scrapy spider extensions that modify outgoing and incoming connections. It's a convenient tool to introduce connection logic to scrapy spiders.

For example, scrapy middlewares are often used to:

Retry and filter requests and reponses based on their content.
Modify outgoing connections with different header or proxies
Collecting and tracking connection performance.

Scrapy comes with several default middlewares that perform common tasks such as:

retry common exceptions
handle redirects
track cookies
decompresses compressed responses

Being able to define custom middlewares is the real power of scrapy middlewares. For example, here's a middleware that adds a header to each request:

# middlewares.py
class CustomHeaderMiddleware:
    def process_request(self, request, spider):
        request.headers['x-token'] = "123456"

# settings.py
DOWNLOADER_MIDDLEWARES = {
    'your_project_name.middlewares.CustomHeaderMiddleware': 500,
}

In this example, we're adding a x-token header to each outgoing request. The process_request method is called for each outgoing request and can be used to modify the request object.

What are scrapy middlewares and how to use them?

Related Articles

Web Scraping Dynamic Websites With Scrapy Playwright

Web Scraping Dynamic Web Pages With Scrapy Selenium

Scrapy Splash Guide: Scrape Dynamic Websites With Scrapy

Web Scraping With Scrapy: The Complete Guide in 2025

How to Scrape YouTube in 2025

Bypass Proxy Detection with Browser Fingerprint Impersonation