How to pass data between scrapy callbacks in Scrapy?

Since scrapy is using callbacks for scraping transferring data between request steps can appear complicated. So, how do we fill a single item using multiple scrapy requests?

For example, if we need to scrape 3 pages - product data, reviews and shipping options - we need 3 callbacks and continuously transfer data between them:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def parse(self, response):
        item = {"price": "123"}
        yield scrapy.Request(".../reviews", meta={"item": item})
    
    def parse_reviews(self, response):
        item = response.meta['item']
        item['reviews'] = ['awesome']
        yield scrapy.Request(".../reviews", meta={"item": item})

    def parse_shipping(self, response):
        item = response.meta['item']
        item['shipping'] = "14.22 USD"
        yield item

In this example, we're using Request.meta to preserve our scraped item through all 3 requests. In the first one we extract product details, second one review data and last one shipping price and return the final dataset.

Provided by Scrapfly

This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇