Web Scraping With Scrapy Intro Through Examples
Tutorial on web scraping with scrapy and Python through a real world example project. Best practices, extension highlights and common challenges.
Since scrapy is using callbacks for scraping transferring data between request steps can appear complicated. So, how do we fill a single item using multiple scrapy requests?
For example, if we need to scrape 3 pages - product data, reviews and shipping options - we need 3 callbacks and continuously transfer data between them:
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
def parse(self, response):
item = {"price": "123"}
yield scrapy.Request(".../reviews", meta={"item": item})
def parse_reviews(self, response):
item = response.meta['item']
item['reviews'] = ['awesome']
yield scrapy.Request(".../reviews", meta={"item": item})
def parse_shipping(self, response):
item = response.meta['item']
item['shipping'] = "14.22 USD"
yield item
In this example, we're using Request.meta
to preserve our scraped item through all 3 requests. In the first one we extract product details, second one review data and last one shipping price and return the final dataset.