Scrapy Knowledgebase

Scrapy is by far the most popular web scraping framework across all programming languages. It is a powerful and flexible framework that allows you to scrape websites, extract data, and store it in various formats.

Under the hood, scrapy uses many existing Python tools and packages everything in a single framework where each scraper is called a "spider".

Despite scrapy being quite old it still holds up and is a great choice for web scraping projects of all sizes. It has a large community, extensive documentation, and many plugins and extensions that make it easy to customize and extend. Though at a cost of flexibility.

Web Scraping With Scrapy: The Complete Guide in 2025

Tutorial on web scraping with scrapy and Python through a real world example project. Best practices, extension highlights and common challenges.

Web Scraping With Scrapy: The Complete Guide in 2025

See below for more on Scrapy in the context of web scraping and data programming 👇

What are scrapy middlewares and how to use them?

Scrapy downloader middlewares can be used to intercept and update outgoing requests and incoming responses. Here's how to use them.

#scrapy

What are scrapy pipelines and how to use them?

Scrapy pipelines can be used to extend scraped result data with new fields or validate the whole datasets. Here's how.

#scrapy

How to add headers to every or some scrapy requests?

To add headers to scrapy's request the `DEFAULT_REQUEST_HEADERS` settting or a custom request middleware can be used. Here's how.

#scrapy
#http

How to pass custom parameters to scrapy spiders?

To pass custom parameters to scrapy spider there CLI argument -a can be used. Here's how and why is it such a useful feature.

#scrapy

How to rotate proxies in scrapy spiders?

To rotate proxies in scrapy spiders a request middleware can be used to randomly or smartly select the most viable proxy. Here's how.

#scrapy
#proxies

How to use headless browsers with scrapy?

To use headless browser with scrapy a plugin like scrapy-playwright can be used. Here's how to use it and what are some other alternatives.

#scrapy
#headless-browser

How to pass data between scrapy callbacks in Scrapy?

To pass data between scrapy callbacks when scraping multiple pages the Request.item can be used. Here's how.

#scrapy

How to pass data from start_requests to parse callbacks in scrapy?

To pass data between scrapy callbacks like start_request and parse the Request.meta attribute can be used. Here's how.

#scrapy