Ebay is the world's biggest peer-to-peer e-commerce web market, making it an attractive target for public data collection!
In this guide, we'll explain how to scrape Ebay search and listing pages for various details, inlcuding pricing, variant information, features, and descriptions.
We'll use Python, a few community packages, and some clever parsing techniques. Let's get started!
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:
Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.
Scrapfly does not offer legal advice but these are good general rules to follow in web scraping
and for more you should consult a lawyer.
Why Scrape Ebay?
Ebay is one of the world's biggest product marketplaces, especially for more niche and rare items. This makes Ebay a great target for e-commerce data analytics.
Scraping Ebay data empoers various use cases, including:
Competitor analysis by gathering data on competitors' sales and reviews.
Market research by tracking product prices for hot deals or trends.
Empowered navigation through automated search patterns and custom alerts.
nested-lookup: To find nested keys in the Ebay JSON datasets.
The above packages can be installed using the below pip command:
$ pip install httpx[http2] parsel nested_lookup
Note that httpx can be replaced with other HTTP clients, such as requests. As for Parsel, another great alternative is BeautifulSoup.
Scraping Ebay Listings
Let's get started by scraping Ebay for single listing pages. Ebay listings consists of two types:
Multiple variant listings with different selections, like tech devices.
Single variant listings with fixed selections.
We'll be using single variants since they are more straightforward to extract. Let's take this product for example, we'll be extracting data from the below fields:
In the image above we marked our fields and to build CSS selectors to select these fields we can use the Browser Developer Tools (F12 key or right click -> inspect option).
Before we start with the parsing logic, let's configure our HTTP requests' connection to prevent Ebay scraping blocking:
Above, we create a session using the httpx.Client class. Then, we define the below client parameters:
Basic browser-like headers: User-Agent and the Accept- headers family.
Enable the HTTP2 protocol support.
Enable the follow_redirects parameter to follow redirects automatically.
The above configuration can significantly reduce the chances of getting our Ebay scraper blocked by mimicking normal user behavior. Now that our configuration is complete, let's scrape Ebay's listings:
import json
import httpx
from parsel import Selector
# establish our HTTP2 client with browser-like headers
session = httpx.Client(
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
http2=True,
follow_redirects=True
)
def parse_product(response: httpx.Response) -> dict:
"""Parse Ebay's product listing page for core product data"""
sel = Selector(response.text)
# define helper functions that chain the extraction process
css_join = lambda css: "".join(sel.css(css).getall()).strip() # join all selected elements
css = lambda css: sel.css(css).get("").strip() # take first selected element and strip of leading/trailing spaces
item = {}
item["url"] = css('link[rel="canonical"]::attr(href)')
item["id"] = item["url"].split("/itm/")[1].split("?")[0] # we can take ID from the URL
item["price_original"] = css(".x-price-primary>span::text")
item["price_converted"] = css(".x-price-approx__price ::text") # ebay automatically converts price for some regions
item["name"] = css_join("h1 span::text")
item["seller_name"] = sel.xpath("//div[contains(@class,'info__about-seller')]/a/span/text()").get()
item["seller_url"] = sel.xpath("//div[contains(@class,'info__about-seller')]/a/@href").get().split("?")[0]
item["photos"] = sel.css('.ux-image-filmstrip-carousel-item.image img::attr("src")').getall() # carousel images
item["photos"].extend(sel.css('.ux-image-carousel-item.image img::attr("src")').getall()) # main image
# description is an iframe (independant page). We can keep it as an URL or scrape it later.
item["description_url"] = css("iframe#desc_ifr::attr(src)")
# feature details from the description table:
features = {}
feature_table = sel.css("div.ux-layout-section--features")
for feature in feature_table.css("dl.ux-labels-values"):
# iterate through each label of the table and select first sibling for value:
label = "".join(feature.css(".ux-labels-values__labels-content > div > span::text").getall()).strip(":\n ")
value = "".join(feature.css(".ux-labels-values__values-content > div > span *::text").getall()).strip(":\n ")
features[label] = value
item["features"] = features
return item
response = session.get("https://www.ebay.com/itm/332562282948")
product_data = parse_product(response)
# print the results in JSON format
print(json.dumps(product_data, indent=2))
Here, we use our httpx client to request the target web page URL for the HTML document retrieval. Next, we use the parse_product to parse the raw data extracted using XPath and CSS selectors.
Here's what the collected data from the above Ebay scraping snippet should look like:
Next, for products with variants we'll have to go a bit further and extract the page's hidden web data. It might seem like a complex process, though we'll cover it step-by-step!
Scraping Ebay Listing Variant Data
Ebay's listings can contain multiple products through a feature called variants. For example, let's take this iPhone listing:
We can see several variant options: model, storage capacity, and color. These options are updated using JavaScript each time we a select one.
Ebay is using JavaScript to update the page with a different price every time we choose a different option. That means that the varaint data exist in a JavaScript variable. Extracting these data is commonly known as hidden web data.
We'll briefly mention the hidden web data extraction in this guide. For the full details, refer to our dedicated tutorial.
To extract data from HTML pages, we'll be using the below utility to find all JSON objects in any text strings:
import json
import httpx
from collections import defaultdict
from nested_lookup import nested_lookup
from parsel import Selector
session = httpx.Client(
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
http2=True,
follow_redirects=True,
)
def find_json_objects(text: str, decoder=json.JSONDecoder()):
"""Find JSON objects in text, and generate decoded JSON data"""
pos = 0
while True:
match = text.find("{", pos)
if match == -1:
break
try:
result, index = decoder.raw_decode(text[match:])
yield result
pos = match + index
except ValueError:
pos = match + 1
def parse_variants(response: httpx.Response) -> dict:
"""
Parse variant data from Ebay's listing page of a product with variants.
This data is located in a js variable MSKU hidden in a <script> element.
"""
selector = Selector(response.text)
script = selector.xpath('//script[contains(., "MSKU")]/text()').get()
if not script:
return {}
all_data = list(find_json_objects(script))
data = nested_lookup("MSKU", all_data)[0]
# First retrieve names for all selection options (e.g. Model, Color)
selection_names = {}
for menu in data["selectMenus"]:
for id_ in menu["menuItemValueIds"]:
selection_names[id_] = menu["displayLabel"]
# Then, find all selection combinations:
selections = []
for v in data["menuItemMap"].values():
selections.append(
{
"name": v["valueName"],
"variants": v["matchingVariationIds"],
"label": selection_names[v["valueId"]],
}
)
results = []
variant_data = nested_lookup("variationsMap", data)[0]
for id_, variant in variant_data.items():
result = defaultdict(list)
result["id"] = id_
for selection in selections:
if int(id_) in selection["variants"]:
result[selection["label"]] = selection["name"]
result["price_original"] = variant["binModel"]["price"]["value"]["convertedFromValue"]
result["price_original_currency"] = variant["binModel"]["price"]["value"]["convertedFromCurrency"]
result["price_converted"] = variant["binModel"]["price"]["value"]["value"]
result["price_converted_currency"] = variant["binModel"]["price"]["value"]["currency"]
result["out_of_stock"] = variant["quantity"]["outOfStock"]
results.append(dict(result))
return results
response = session.get("https://www.ebay.com/itm/393531906094")
item = parse_product(response) # previous parse_product function
item['variants'] = parse_variants(response)
print(json.dumps(item, indent=2))
In the above Ebay scraper, we extract the variant listing data using the below steps:
Selecting the script tag containing the MSKU variable.
Extracting the JSON datasets using the find_json_objects utility.
Iterating over the various options and selecting the useful fields.
Here's what the retrieved Ebay scraping results should look like:
Next, let's see how to scrape Ebay search.
Scraping Ebay Search
To start scraping Ebay search results, let's reverse engineer it. When we input a search keyword we can see that Ebay is redirecting us to a different URL where the search results are located. For example, if we search for the term iphone we'll be taken to an URL similar to ebay.com/sch/i.html?_nkw=iphone&_sacat=0.
When a search query is submitted, Ebay redirects the requests to a search result document. For instance, searh the keyword iphone, and you will get reidrected to a URL similar to ebay.com/sch/i.html?_nkw=iphone&_sacat=0.
The page of the above URL uses several URL parameters to define the search query:
_nkw for search keyword.
_sacar the category restriction.
_sop sorting type.
_pgn page number.
_ipg listings per page (default is 60).
We can find more arguments by clicking around and exploring the search. To keep our Ebay web scraper short, let's stick with the below five parameters:
import json
import math
import httpx
import asyncio
from typing import Dict, List, Literal
from urllib.parse import urlencode
from parsel import Selector
SORTING_MAP = {
"best_match": 12,
"ending_soonest": 1,
"newly_listed": 10,
}
session = httpx.AsyncClient(
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
http2=True,
follow_redirects=True
)
def parse_search(response: httpx.Response) -> List[Dict]:
"""parse ebay's search page for listing preview details"""
previews = []
# each listing has it's own HTML box where all of the data is contained
sel = Selector(response.text)
listing_boxes = sel.css(".srp-results li.s-item")
for box in listing_boxes:
# quick helpers to extract first element and all elements
css = lambda css: box.css(css).get("").strip()
css_all = lambda css: box.css(css).getall()
previews.append(
{
"url": css("a.s-item__link::attr(href)").split("?")[0],
"title": css(".s-item__title>span::text"),
"price": css(".s-item__price::text"),
"shipping": css(".s-item__shipping::text"),
"list_date": css(".s-item__listingDate span::text"),
"subtitles": css_all(".s-item__subtitle::text"),
"condition": css(".s-item__subtitle .SECONDARY_INFO::text"),
"photo": css(".s-item__image img::attr(src)"),
"rating": css(".s-item__reviews .clipped::text"),
"rating_count": css(".s-item__reviews-count span::text"),
}
)
return previews
async def scrape_search(
query,
max_pages=1,
category=0,
items_per_page=240,
sort: Literal["best_match", "ending_soonest", "newly_listed"] = "newly_listed",
) -> List[Dict]:
"""Scrape Ebay's search results page for product preview data for given"""
def make_request(page):
return "https://www.ebay.com/sch/i.html?" + urlencode(
{
"_nkw": query,
"_sacat": category,
"_ipg": items_per_page,
"_sop": SORTING_MAP[sort],
"_pgn": page,
}
)
first_page = await session.get(make_request(page=1))
results = parse_search(first_page)
if max_pages == 1:
return results
# find total amount of results for concurrent pagination
total_results = first_page.selector.css(".srp-controls__count-heading>span::text").get()
total_results = int(total_results.replace(",", ""))
total_pages = math.ceil(total_results / items_per_page)
if total_pages > max_pages:
total_pages = max_pages
other_pages = [session.get(make_request(page=i)) for i in range(2, total_pages + 1)]
for response in asyncio.as_completed(other_pages):
response = await response
try:
results.extend(parse_search(response))
except Exception as e:
print(f"failed to scrape search page {response.url}")
return results
data = asyncio.run(scrape_search("iphone 14 pro max"))
print(json.dumps(data, indent=2))
Here's what the extracted Ebay data looks like:
Avoiding Ebay Scraping Blocking
Creating an Ebay scraper seems straightforward. However, attempting the scale is the tricky part! Ebay can differentiate our requests as being automated, hence asking for CAPTCHA challenges or even block the scraping process entirely!
To take advantage of ScrapFlys API in our Ebay scraper, all we have to do is replace httpx with scrapfly-sdk client:
import httpx
response = httpx.get("some ebay.com url")
# in ScrapFly SDK becomes 👇
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient("YOUR SCRAPFLY KEY")
result = client.scrape(ScrapeConfig(
# some ebay URL
"https://www.ebay.com/itm/393531906094",
# we can select specific proxy country
country="US",
# and enable anti scraping protection bypass:
asp=True,
# enable JavaScript rendering if required
render_js=True
))
For more on how to scrape Ebay.com using ScrapFly, see the Full Scraper Code section.
FAQ
To wrap this guide up, let's take a look at some frequently asked questions regarding how to scrape data from Ebay:
Is it legal to scrape ebay.com?
Yes. Ebay's data is publically available - scraping Ebay at slow, respectful rates would fall under the ethical scraping definition.
That being said, be aware of GDRP compliance in the EU when storing personal data such as sellers personal details like names or location. For more, see our Is Web Scraping Legal? article.
How to crawl Ebay.com?
To web crawl Ebay we can adapt the scraping techniques covered in this article. Every ebay listing contains related products which we can extract and feed into our scraping loop turning our scraper into a crawler that is capable of finding new details to crawl.
Is there an Ebay API?
No. While Ebay does have a private catalog API it contains only metadata fields like product ids. For the full product details, the only way is to scrape Ebay as described in this guide.
In this guide, we wrote a Python Ebay scraper for product listing data using nothing but Python and a few community packages: httpx for retrieving the content and parsel for parsing it.
We've scraped data from three parts of the Ebay domain:
Single variant products - using basic CSS selector parsing logic.
Multiple variant products - using hidden web data extraction.
Search pages - using search parameters and basic crawling rules.
Finally, to avoid Ebay scraping blocking, we used ScrapFly's API to automatically configure the HTTP connection. For more about ScrapFly, see our documentation and try it out for FREE!
In this article, we'll explore how to scrape Reddit. We'll extract various social data types from subreddits, posts, and user pages. All of which through plain HTTP requests without headless browser usage.
In this scrape guide we'll be taking a look at one of the most popular web scraping targets - LinkedIn.com. We'll be scraping people profiles, company profiles as well as job listings and search.
In this guide, we'll explain how to scrape SimilarWeb through a step-by-step guide. We'll scrape comprehensive website traffic insights, websites comparing data, sitemaps, and trending industry domains.