How to pass data from start_requests to parse callbacks in scrapy?

by scrapecrow Apr 20, 2023

Scrapy is a callback driver web scraping framework that can make it difficult to pass data from the initial start_requests() method to the parse() callback and any callbacks that follow.

To start, to transfer data to the parse() callback from the initial start_requests() method the Request.meta attribute can be used:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        urls = [...]
        for index, url in enumerate(urls):
            yield scrapy.Request(url, meta={'index':index})

    def parse(self, response):
        print(response.url)
        print(response.meta['index'])

In the example above we are using Request.meta parameter and pass the index of URL that has been scheduled to be scraped.

We can continue with this Request.meta pipeline and pass data between callbacks indefinitely until we reach the final callback where we can return the final item:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        urls = [...]
        for index, url in enumerate(urls):
            yield scrapy.Request(url, meta={'item':{"index": index}})

    def parse(self, response):
        item = response.meta['item']
        item['price'] = 100
        yield scrapy.Request(".../reviews", meta={"item": item}, callback=self.parse_reviews)
    
    def parse_reviews(sefl, response):
        item = response.meta['item']
        item['reviews'] = ['awesome']
        yield item

In the example above we've extended our chain to generate a single item from 2 requests.

Note that when using callback chaining with a single result item we should be dilligent to handle failure with errback parameter because item could be lost at any step of the way.

Additionally, it's best to pass immutable or low-reference data to avoid unexpected behavior and potential memory leak problems.

Related Articles

Web Scraping Dynamic Websites With Scrapy Playwright

Learn about Selenium Playwright. A Scrapy integration that allows web scraping dynamic web pages with Scrapy. We'll explain web scraping with Scrapy Playwright through an example project and how to use it for common scraping use cases, such as clicking elements, scrolling and waiting for elements.

PYTHON
PLAYWRIGHT
SCRAPY
HEADLESS-BROWSER
Web Scraping Dynamic Websites With Scrapy Playwright

Web Scraping Dynamic Web Pages With Scrapy Selenium

Learn how to scrape dynamic web pages with Scrapy Selenium. You will also learn how to use Scrapy Selenium for common scraping use cases, such as waiting for elements, clicking buttons and scrolling.

PYTHON
SCRAPY
HEADLESS-BROWSER
SELENIUM
Web Scraping Dynamic Web Pages With Scrapy Selenium

Scrapy Splash Guide: Scrape Dynamic Websites With Scrapy

Learn about web scraping with Scrapy Splash, which lets Scrapy scrape dynamic web pages. We'll define Splash, cover installation and navigation, and provide a step-by-step guide for using Scrapy Splash.

PYTHON
HEADLESS-BROWSER
FRAMEWORK
SCRAPY
Scrapy Splash Guide: Scrape Dynamic Websites With Scrapy

Web Scraping With Scrapy: The Complete Guide in 2025

Tutorial on web scraping with scrapy and Python through a real world example project. Best practices, extension highlights and common challenges.

PYTHON
SCRAPY
FRAMEWORK
XPATH
INTRO
Web Scraping With Scrapy: The Complete Guide in 2025

How to Scrape YouTube in 2025

Learn how to scrape YouTube, channel, video, and comment data using Python directly in JSON.

SCRAPEGUIDE
PYTHON
HIDDEN-API
How to Scrape YouTube in 2025

Bypass Proxy Detection with Browser Fingerprint Impersonation

Stop proxy blocks with browser fingerprint impersonation using this guide for Playwright, Selenium, curl-impersonate & Scrapfly

PROXIES
SELENIUM
PLAYWRIGHT
PUPPETEER
BLOCKING
Bypass Proxy Detection with Browser Fingerprint Impersonation