🚀 We are hiring! See open positions

How to pass data from start_requests to parse callbacks in scrapy?

by Bernardas Alisauskas Jul 12, 2023 1 min read

Scrapy is a callback driver web scraping framework that can make it difficult to pass data from the initial start_requests() method to the parse() callback and any callbacks that follow.

To start, to transfer data to the parse() callback from the initial start_requests() method the Request.meta attribute can be used:

python
import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        urls = [...]
        for index, url in enumerate(urls):
            yield scrapy.Request(url, meta={'index':index})

    def parse(self, response):
        print(response.url)
        print(response.meta['index'])

In the example above we are using Request.meta parameter and pass the index of URL that has been scheduled to be scraped.

We can continue with this Request.meta pipeline and pass data between callbacks indefinitely until we reach the final callback where we can return the final item:

python
import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        urls = [...]
        for index, url in enumerate(urls):
            yield scrapy.Request(url, meta={'item':{"index": index}})

    def parse(self, response):
        item = response.meta['item']
        item['price'] = 100
        yield scrapy.Request(".../reviews", meta={"item": item}, callback=self.parse_reviews)

    def parse_reviews(sefl, response):
        item = response.meta['item']
        item['reviews'] = ['awesome']
        yield item

In the example above we've extended our chain to generate a single item from 2 requests.

Note that when using callback chaining with a single result item we should be dilligent to handle failure with errback parameter because item could be lost at any step of the way.

Additionally, it's best to pass immutable or low-reference data to avoid unexpected behavior and potential memory leak problems.

Scale Your Web Scraping
Anti-bot bypass, browser rendering, and rotating proxies — all in one API. Start with 1,000 free credits.
No credit card required 1,000 free API credits Anti-bot bypass included
Not ready? Get our newsletter instead.