Scrapy middlewares are Scrapy spider extensions that modify outgoing and incoming connections. It's a convenient tool to introduce connection logic to scrapy spiders.
For example, scrapy middlewares are often used to:
- Retry and filter requests and reponses based on their content.
- Modify outgoing connections with different header or proxies
- Collecting and tracking connection performance.
Scrapy comes with several default middlewares that perform common tasks such as:
- retry common exceptions
- handle redirects
- track cookies
- decompresses compressed responses
Being able to define custom middlewares is the real power of scrapy middlewares. For example, here's a middleware that adds a header to each request:
# middlewares.py
class CustomHeaderMiddleware:
def process_request(self, request, spider):
request.headers['x-token'] = "123456"
# settings.py
DOWNLOADER_MIDDLEWARES = {
'your_project_name.middlewares.CustomHeaderMiddleware': 500,
}
In this example, we're adding a x-token
header to each outgoing request. The process_request
method is called for each outgoing request and can be used to modify the request object.