Web Scraping With Scrapy Intro Through Examples
Tutorial on web scraping with scrapy and Python through a real world example project. Best practices, extension highlights and common challenges.
Scrapy middlewares are scrapy spider extensions that modify outgoing and incoming connections.
Scrapy middlewares are often used to:
Scrapy comes with several default middlewares that retry common exceptions, handle redirects, track cookies and decompresses compressed responses.
For example, here's a middleware that adds a header to each request:
# middlewares.py
class CustomHeaderMiddleware:
def process_request(self, request, spider):
request.headers['x-token'] = "123456"
# settings.py
DOWNLOADER_MIDDLEWARES = {
'your_project_name.middlewares.CustomHeaderMiddleware': 500,
}