How to pass data between scrapy callbacks in Scrapy?

Since scrapy is using callbacks for scraping transferring data between request steps can appear complicated. So, how do we fill a single item using multiple scrapy requests?

For example, if we need to scrape 3 pages - product data, reviews and shipping options - we need 3 callbacks and continuously transfer data between them:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def parse(self, response):
        item = {"price": "123"}
        yield scrapy.Request(".../reviews", meta={"item": item})
    
    def parse_reviews(self, response):
        item = response.meta['item']
        item['reviews'] = ['awesome']
        yield scrapy.Request(".../reviews", meta={"item": item})

    def parse_shipping(self, response):
        item = response.meta['item']
        item['shipping'] = "14.22 USD"
        yield item

In this example, we're using Request.meta to preserve our scraped item through all 3 requests. In the first one we extract product details, second one review data and last one shipping price and return the final dataset.

Question tagged: scrapy

Related Posts

Web Scraping Dynamic Websites With Scrapy Playwright

Learn about Selenium Playwright. A Scrapy integration that allows web scraping dynamic web pages with Scrapy. We'll explain web scraping with Scrapy Playwright through an example project and how to use it for common scraping use cases, such as clicking elements, scrolling and waiting for elements.

Web Scraping Dynamic Web Pages With Scrapy Selenium

Learn how to scrape dynamic web pages with Scrapy Selenium. You will also learn how to use Scrapy Selenium for common scraping use cases, such as waiting for elements, clicking buttons and scrolling.

Scrapy Splash Guide: Scrape Dynamic Websites With Scrapy

Learn about web scraping with Scrapy Splash, which lets Scrapy scrape dynamic web pages. We'll define Splash, cover installation and navigation, and provide a step-by-step guide for using Scrapy Splash.