What are scrapy Item and ItemLoader objects and how to use them?

by scrapecrow Apr 20, 2023

Scrapy's Item and ItemLoader classes are a convenient way to store and managed scraped data.

The Item class is a dataclass similar to Python's @dataclass or pydantic.BaseModel where data fields defined:

import scrapy 

class Person(scrapy.Item):
    name = Field()
    last_name = Field()
    bio = Field()
    age = Field()
    weight = Field()
    height = Field()

Whereas ItemLoader objects are used to populate the items with data:

import scrapy

class PersonLoader(ItemLoader):
    default_item_class = Person
    # <fieldname>_out is used to define parsing rules for each item
    name_out = lambda values: values[0]
    last_name_out = lambda values: values[0]
    bio_out = lambda values: ''.join(values).strip()
    age_out = int
    weight_out = int
    height_out = int

class MySpider(scrapy.Spider):
    ...
    def parse(self, response):
        # create loader and pass response object to it:
        loader = PersonLoader(selector=response)
        # add parsing rules like XPath:
        loader.add_xpath('full_name', "//div[contains(@class,'name')]/text()")
        loader.add_xpath('bio', "//div[contains(@class,'bio')]/text()")
        loader.add_xpath('age', "//div[@class='age']/text()")
        loader.add_xpath('weight', "//div[@class='weight']/text()")
        loader.add_xpath('height', "//div[@class='height']/text()")
        # call load item to parse data and return item:
        yield loader.load_item()

Here we defined parsing rules in the PersonLoader definition, like:

  • taking the first found value for the name.
  • converting numeric values to integers.
  • joining all values for the bio field.

Then, to parse the response with these rules the loader.load_item() forming our final item.

Using Item and ItemLoader classes is the standard way to structure spider data structures in scrapy
and is a convenient way to keep the data process tidy and understandable.

Related Articles

Web Scraping Dynamic Websites With Scrapy Playwright

Learn about Selenium Playwright. A Scrapy integration that allows web scraping dynamic web pages with Scrapy. We'll explain web scraping with Scrapy Playwright through an example project and how to use it for common scraping use cases, such as clicking elements, scrolling and waiting for elements.

PYTHON
PLAYWRIGHT
SCRAPY
HEADLESS-BROWSER
Web Scraping Dynamic Websites With Scrapy Playwright

Web Scraping Dynamic Web Pages With Scrapy Selenium

Learn how to scrape dynamic web pages with Scrapy Selenium. You will also learn how to use Scrapy Selenium for common scraping use cases, such as waiting for elements, clicking buttons and scrolling.

PYTHON
SCRAPY
HEADLESS-BROWSER
SELENIUM
Web Scraping Dynamic Web Pages With Scrapy Selenium

Scrapy Splash Guide: Scrape Dynamic Websites With Scrapy

Learn about web scraping with Scrapy Splash, which lets Scrapy scrape dynamic web pages. We'll define Splash, cover installation and navigation, and provide a step-by-step guide for using Scrapy Splash.

PYTHON
HEADLESS-BROWSER
FRAMEWORK
SCRAPY
Scrapy Splash Guide: Scrape Dynamic Websites With Scrapy

Web Scraping With Scrapy: The Complete Guide in 2025

Tutorial on web scraping with scrapy and Python through a real world example project. Best practices, extension highlights and common challenges.

PYTHON
SCRAPY
FRAMEWORK
XPATH
INTRO
Web Scraping With Scrapy: The Complete Guide in 2025

How to Scrape YouTube in 2025

Learn how to scrape YouTube, channel, video, and comment data using Python directly in JSON.

SCRAPEGUIDE
PYTHON
HIDDEN-API
How to Scrape YouTube in 2025

Advanced Proxy Connection Optimization Techniques

Master advanced proxy optimization with TCP connection pooling, TLS fingerprinting, DNS caching, and HTTP/2 multiplexing for maximum performance.

PROXIES
Advanced Proxy Connection Optimization Techniques