Web Scraping With Scrapy Intro Through Examples
Tutorial on web scraping with scrapy and Python through a real world example project. Best practices, extension highlights and common challenges.
Scrapy middlewares are Scrapy spider extensions that modify outgoing and incoming connections. It's a convenient tool to introduce connection logic to scrapy spiders.
For example, scrapy middlewares are often used to:
Scrapy comes with several default middlewares that perform common tasks such as:
Being able to define custom middlewares is the real power of scrapy middlewares. For example, here's a middleware that adds a header to each request:
# middlewares.py
class CustomHeaderMiddleware:
def process_request(self, request, spider):
request.headers['x-token'] = "123456"
# settings.py
DOWNLOADER_MIDDLEWARES = {
'your_project_name.middlewares.CustomHeaderMiddleware': 500,
}
In this example, we're adding a x-token
header to each outgoing request. The process_request
method is called for each outgoing request and can be used to modify the request object.