How to Scrape YouTube in 2025
Learn how to scrape YouTube, channel, video, and comment data using Python directly in JSON.
Ebay is the world's biggest peer-to-peer e-commerce web market, making it an attractive target for public data collection!
In this guide, we'll explain how to scrape Ebay search and listing pages for various details, inlcuding pricing, variant information, features, and descriptions.
We'll use Python, a few community packages, and some clever parsing techniques. Let's get started!
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:
Scrapfly does not offer legal advice but these are good general rules to follow in web scraping
and for more you should consult a lawyer.
Ebay is one of the world's biggest product marketplaces, especially for more niche and rare items. This makes Ebay a great target for e-commerce data analytics.
Scraping Ebay data empoers various use cases, including:
For further details, refer to our introduction on web scraping use cases.
Web scraping Ebay requires using a few Python community packages:
In this tutorial, we'll be using Python with two important community libraries:
The above packages can be installed using the below pip
command:
$ pip install httpx[http2] parsel nested_lookup
Note that httpx can be replaced with other HTTP clients, such as requests. As for Parsel
, another great alternative is BeautifulSoup.
Let's get started by scraping Ebay for single listing pages. Ebay listings consists of two types:
We'll be using single variants since they are more straightforward to extract. Let's take this product for example, we'll be extracting data from the below fields:
In the image above we marked our fields and to build CSS selectors to select these fields we can use the Browser Developer Tools (F12
key or right click -> inspect
option).
Before we start with the parsing logic, let's configure our HTTP requests' connection to prevent Ebay scraping blocking:
import httpx
session = httpx.Client(
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
http2=True,
follow_redirects=True
)
Above, we create a session using the httpx.Client
class. Then, we define the below client parameters:
Accept-
headers family.HTTP2
protocol support.follow_redirects
parameter to follow redirects automatically.The above configuration can significantly reduce the chances of getting our Ebay scraper blocked by mimicking normal user behavior. Now that our configuration is complete, let's scrape Ebay's listings:
import json
import httpx
from parsel import Selector
# establish our HTTP2 client with browser-like headers
session = httpx.Client(
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
http2=True,
follow_redirects=True
)
def parse_product(response: httpx.Response) -> dict:
"""Parse Ebay's product listing page for core product data"""
sel = Selector(response.text)
# define helper functions that chain the extraction process
css_join = lambda css: "".join(sel.css(css).getall()).strip() # join all selected elements
css = lambda css: sel.css(css).get("").strip() # take first selected element and strip of leading/trailing spaces
item = {}
item["url"] = css('link[rel="canonical"]::attr(href)')
item["id"] = item["url"].split("/itm/")[1].split("?")[0] # we can take ID from the URL
item["price_original"] = css(".x-price-primary>span::text")
item["price_converted"] = css(".x-price-approx__price ::text") # ebay automatically converts price for some regions
item["name"] = css_join("h1 span::text")
item["seller_name"] = sel.xpath("//div[contains(@class,'info__about-seller')]/a/span/text()").get()
item["seller_url"] = sel.xpath("//div[contains(@class,'info__about-seller')]/a/@href").get().split("?")[0]
item["photos"] = sel.css('.ux-image-filmstrip-carousel-item.image img::attr("src")').getall() # carousel images
item["photos"].extend(sel.css('.ux-image-carousel-item.image img::attr("src")').getall()) # main image
# description is an iframe (independant page). We can keep it as an URL or scrape it later.
item["description_url"] = css("iframe#desc_ifr::attr(src)")
# feature details from the description table:
features = {}
feature_table = sel.css("div.ux-layout-section--features")
for feature in feature_table.css("dl.ux-labels-values"):
# iterate through each label of the table and select first sibling for value:
label = "".join(feature.css(".ux-labels-values__labels-content > div > span::text").getall()).strip(":\n ")
value = "".join(feature.css(".ux-labels-values__values-content > div > span *::text").getall()).strip(":\n ")
features[label] = value
item["features"] = features
return item
response = session.get("https://www.ebay.com/itm/332562282948")
product_data = parse_product(response)
# print the results in JSON format
print(json.dumps(product_data, indent=2))
Here, we use our httpx
client to request the target web page URL for the HTML document retrieval. Next, we use the parse_product
to parse the raw data extracted using XPath and CSS selectors.
Here's what the collected data from the above Ebay scraping snippet should look like:
Next, for products with variants we'll have to go a bit further and extract the page's hidden web data. It might seem like a complex process, though we'll cover it step-by-step!
Ebay's listings can contain multiple products through a feature called variants. For example, let's take this iPhone listing:
We can see several variant options: model, storage capacity, and color. These options are updated using JavaScript each time we a select one.
Ebay is using JavaScript to update the page with a different price every time we choose a different option. That means that the varaint data exist in a JavaScript variable. Extracting these data is commonly known as hidden web data.
We'll briefly mention the hidden web data extraction in this guide. For the full details, refer to our dedicated tutorial.
For full introduction on scraping javascript variables see our hidden web data scraping tutorial.
To extract data from HTML pages, we'll be using the below utility to find all JSON objects in any text strings:
import json
import httpx
from collections import defaultdict
from nested_lookup import nested_lookup
from parsel import Selector
session = httpx.Client(
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
http2=True,
follow_redirects=True,
)
def find_json_objects(text: str, decoder=json.JSONDecoder()):
"""Find JSON objects in text, and generate decoded JSON data"""
pos = 0
while True:
match = text.find("{", pos)
if match == -1:
break
try:
result, index = decoder.raw_decode(text[match:])
yield result
pos = match + index
except ValueError:
pos = match + 1
def parse_variants(response: httpx.Response) -> dict:
"""
Parse variant data from Ebay's listing page of a product with variants.
This data is located in a js variable MSKU hidden in a <script> element.
"""
selector = Selector(response.text)
script = selector.xpath('//script[contains(., "MSKU")]/text()').get()
if not script:
return {}
all_data = list(find_json_objects(script))
data = nested_lookup("MSKU", all_data)[0]
# First retrieve names for all selection options (e.g. Model, Color)
selection_names = {}
for menu in data["selectMenus"]:
for id_ in menu["menuItemValueIds"]:
selection_names[id_] = menu["displayLabel"]
# Then, find all selection combinations:
selections = []
for v in data["menuItemMap"].values():
selections.append(
{
"name": v["valueName"],
"variants": v["matchingVariationIds"],
"label": selection_names[v["valueId"]],
}
)
results = []
variant_data = nested_lookup("variationsMap", data)[0]
for id_, variant in variant_data.items():
result = defaultdict(list)
result["id"] = id_
for selection in selections:
if int(id_) in selection["variants"]:
result[selection["label"]] = selection["name"]
result["price_original"] = variant["binModel"]["price"]["value"]["convertedFromValue"]
result["price_original_currency"] = variant["binModel"]["price"]["value"]["convertedFromCurrency"]
result["price_converted"] = variant["binModel"]["price"]["value"]["value"]
result["price_converted_currency"] = variant["binModel"]["price"]["value"]["currency"]
result["out_of_stock"] = variant["quantity"]["outOfStock"]
results.append(dict(result))
return results
response = session.get("https://www.ebay.com/itm/393531906094")
item = parse_product(response) # previous parse_product function
item['variants'] = parse_variants(response)
print(json.dumps(item, indent=2))
In the above Ebay scraper, we extract the variant listing data using the below steps:
script
tag containing the MSKU
variable.find_json_objects
utility.Here's what the retrieved Ebay scraping results should look like:
Next, let's see how to scrape Ebay search.
To start scraping Ebay search results, let's reverse engineer it. When we input a search keyword we can see that Ebay is redirecting us to a different URL where the search results are located. For example, if we search for the term iphone
we'll be taken to an URL similar to ebay.com/sch/i.html?_nkw=iphone&_sacat=0.
When a search query is submitted, Ebay redirects the requests to a search result document. For instance, searh the keyword iphone
, and you will get reidrected to a URL similar to ebay.com/sch/i.html?_nkw=iphone&_sacat=0.
The page of the above URL uses several URL parameters to define the search query:
_nkw
for search keyword._sacar
the category restriction._sop
sorting type._pgn
page number._ipg
listings per page (default is 60).We can find more arguments by clicking around and exploring the search. To keep our Ebay web scraper short, let's stick with the below five parameters:
import json
import math
import httpx
import asyncio
import dateutil
from typing import Dict, List, Literal
from urllib.parse import urlencode
from parsel import Selector
SORTING_MAP = {
"best_match": 12,
"ending_soonest": 1,
"newly_listed": 10,
}
session = httpx.AsyncClient(
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
},
http2=True,
follow_redirects=True
)
def parse_search(response: httpx.Response) -> List[Dict]:
"""parse ebay's search page for listing preview details"""
previews = []
# each listing has it's own HTML box where all of the data is contained
sel = Selector(response.text)
best_selling_boxes = sel.xpath("//*[*[h2[contains(text(),'Best selling products')]]]//li[contains(@class, 's-item')]")
best_selling_html_set = set([b.get() for b in best_selling_boxes])
for box in sel.css(".srp-results li.s-item"):
if box.get() in best_selling_html_set:
continue # skip boxes inside the best selling container
css = lambda css: box.css(css).get("").strip() or None # get first CSS match
css_all = lambda css: box.css(css).getall() # get all CSS matches
css_re = lambda css, pattern: box.css(css).re_first(pattern, default="").strip() # get first css regex match
css_int = lambda css: int(box.css(css).re_first(r"(\d+)", default="0")) if box.css(css) else None
css_float = lambda css: float(box.css(css).re_first(r"(\d+\.*\d*)", default="0.0")) if box.css(css) else None
auction_end = css_re(".s-item__time-end::text", r"\((.+?)\)") or None
if auction_end:
auction_end = dateutil.parser.parse(auction_end.replace("Today", ""))
item = {
"url": css("a.s-item__link::attr(href)").split("?")[0],
"title": css(".s-item__title span::text"),
"price": css(".s-item__price::text"),
"shipping": css_float(".s-item__shipping::text"),
"auction_end": auction_end,
"bids": css_int(".s-item__bidCount::text"),
"location": css(".s-item__itemLocation::text"),
"subtitles": css_all(".s-item__subtitle::text"),
"condition": css(".SECONDARY_INFO::text"),
"photo": css("img::attr(data-src)") or css("img::attr(src)"),
"rating": css_float(".s-item__reviews .clipped::text"),
"rating_count": css_int(".s-item__reviews-count span::text"),
}
previews.append(item)
return previews
async def scrape_search(
query,
max_pages=1,
category=0,
items_per_page=240,
sort: Literal["best_match", "ending_soonest", "newly_listed"] = "newly_listed",
) -> List[Dict]:
"""Scrape Ebay's search results page for product preview data for given"""
def make_request(page):
return "https://www.ebay.com/sch/i.html?" + urlencode(
{
"_nkw": query,
"_sacat": category,
"_ipg": items_per_page,
"_sop": SORTING_MAP[sort],
"_pgn": page,
}
)
first_page = await session.get(make_request(page=1))
results = parse_search(first_page)
if max_pages == 1:
return results
# find total amount of results for concurrent pagination
total_results = first_page.selector.css(".srp-controls__count-heading>span::text").get()
total_results = int(total_results.replace(",", ""))
total_pages = math.ceil(total_results / items_per_page)
if total_pages > max_pages:
total_pages = max_pages
other_pages = [session.get(make_request(page=i)) for i in range(2, total_pages + 1)]
for response in asyncio.as_completed(other_pages):
response = await response
try:
results.extend(parse_search(response))
except Exception as e:
print(f"failed to scrape search page {response.url}")
return results
data = asyncio.run(scrape_search("iphone 14 pro max"))
print(json.dumps(data, indent=2))
Here's what the extracted Ebay data looks like:
Creating an Ebay scraper seems straightforward. However, attempting the scale is the tricky part! Ebay can differentiate our requests as being automated, hence asking for CAPTCHA challenges or even block the scraping process entirely!
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
To avoid Ebay web scraping blocking, we'll be using scrapfly-sdk with the anti-scraping protection bypass feature. Start by installing it using pip
:
$ pip install scrapfly-sdk
To take advantage of ScrapFlys API in our Ebay scraper, all we have to do is replace httpx
with scrapfly-sdk
client:
import httpx
response = httpx.get("some ebay.com url")
# in ScrapFly SDK becomes 👇
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient("YOUR SCRAPFLY KEY")
result = client.scrape(ScrapeConfig(
# some ebay URL
"https://www.ebay.com/itm/393531906094",
# we can select specific proxy country
country="US",
# and enable anti scraping protection bypass:
asp=True,
# enable JavaScript rendering if required
render_js=True
))
For more on how to scrape Ebay.com using ScrapFly, see the Full Scraper Code section.
To wrap this guide up, let's take a look at some frequently asked questions regarding how to scrape data from Ebay:
Yes. Ebay's data is publically available - scraping Ebay at slow, respectful rates would fall under the ethical scraping definition.
That being said, be aware of GDRP compliance in the EU when storing personal data such as sellers personal details like names or location. For more, see our Is Web Scraping Legal? article.
To web crawl Ebay we can adapt the scraping techniques covered in this article. Every ebay listing contains related products which we can extract and feed into our scraping loop turning our scraper into a crawler that is capable of finding new details to crawl.
No. While Ebay does have a private catalog API it contains only metadata fields like product ids. For the full product details, the only way is to scrape Ebay as described in this guide.
In this guide, we wrote a Python Ebay scraper for product listing data using nothing but Python and a few community packages: httpx
for retrieving the content and parsel
for parsing it.
We've scraped data from three parts of the Ebay domain:
Finally, to avoid Ebay scraping blocking, we used ScrapFly's API to automatically configure the HTTP connection. For more about ScrapFly, see our documentation and try it out for FREE!