Web Scraping With Scrapy Intro Through Examples
Tutorial on web scraping with scrapy and Python through a real world example project. Best practices, extension highlights and common challenges.
To configure scrapy spiders with custom execution parameters scrapy's CLI -a
option can be used.
Scrapy sets -a
CLI parameters as scrapy spider instance attributes (e.g. -a country
-> self.country
) when the crawl command is called.
For example, here we are passing country and proxy parameters to our scraper:
scrapy crawl myspider -a country=US -a "proxy=http://222.22.33.44:9000"
import scrapy
class MySpider(scrapy.Spider):
name = "myspider"
def parse(self, response):
print(self.country)
print(self.proxy)
This is an easy and useful feature for when specific customization is needed for each scrapy crawl command.
Additionally, the -s
CLI parameter can be used to set or override any scrapy settings.