Since scrapy is using callbacks for scraping transferring data between request steps can appear complicated. So, how do we fill a single item using multiple scrapy requests?
For example, if we need to scrape 3 pages - product data, reviews and shipping options - we need 3 callbacks and continuously transfer data between them:
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
def parse(self, response):
item = {"price": "123"}
yield scrapy.Request(".../reviews", meta={"item": item})
def parse_reviews(self, response):
item = response.meta['item']
item['reviews'] = ['awesome']
yield scrapy.Request(".../reviews", meta={"item": item})
def parse_shipping(self, response):
item = response.meta['item']
item['shipping'] = "14.22 USD"
yield item
In this example, we're using Request.meta
to preserve our scraped item through all 3 requests. In the first one we extract product details, second one review data and last one shipping price and return the final dataset.