Tutorial on web scraping with scrapy and Python through a real world example project. Best practices, extension highlights and common challenges.
To configure scrapy spiders with custom execution parameters scrapy's CLI
-a option can be used.
-a CLI parameters as scrapy spider instance attributes (e.g.
-a country ->
self.country) when the crawl command is called.
For example, here we are passing country and proxy parameters to our scraper:
scrapy crawl myspider -a country=US -a "proxy=http://126.96.36.199:9000"
import scrapy class MySpider(scrapy.Spider): name = "myspider" def parse(self, response): print(self.country) print(self.proxy)
This is an easy and useful feature for when specific customization is needed for each scrapy crawl command.
-s CLI parameter can be used to set or override any scrapy settings.