Allegro.pl is Poland's largest e-commerce marketplace with millions of product listings across every category. If you are doing price monitoring, market research, or competitive analysis in the Polish market, scraping Allegro is one of the most valuable data sources available.
The catch is that Allegro does not make it easy. The site runs DataDome anti-bot protection, and basic HTTP requests will get blocked fast on category and search pages. In this guide, we will walk through how to scrape both product listings and individual product pages using Python with requests and BeautifulSoup4, and how to handle the anti-bot challenges along the way.
Why Is Allegro.pl Hard to Scrape?
Allegro uses DataDome, one of the most common commercial anti-bot systems on the web today. If you send a plain requests.get() to an Allegro category page, you will almost certainly get a 403 response or a CAPTCHA challenge instead of the actual page content.
There are a few reasons scraping Allegro is harder than most e-commerce sites:
-
DataDome analyzes your request fingerprint. It checks your TLS signature, HTTP headers, and behavioral patterns. A simple Python request looks nothing like a real browser, and DataDome catches that immediately.
-
Allegro is a Polish marketplace, so requests coming from non-Polish IP addresses raise suspicion. If you are scraping from outside Poland without a Polish residential proxy, your success rate drops significantly.
-
listing and search pages are more aggressively protected than product detail pages. You might get lucky scraping a single product URL, but category pages with pagination are where most scrapers fail.
For a deeper look at how DataDome works and how to get around it, check out our dedicated guide.
What Do You Need to Scrape Allegro.pl with Python?
The whole setup runs on three Python packages. Install them first, then we will build a session that Allegro actually accepts.
pip install requests beautifulsoup4 lxmlThe requests library handles HTTP requests, beautifulsoup4 parses the HTML, and lxml gives BeautifulSoup a faster parsing backend.
The most important part of the setup is creating a session that looks like a real browser. Allegro expects Polish language headers, a modern User-Agent string, and standard browser request headers. Here is a simple session helper that covers the basics.
import requests
from bs4 import BeautifulSoup
import re
import time
import random
from typing import List, Dict, Optional
def create_session() -> requests.Session:
"""Create a requests session with browser-like headers"""
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "pl-PL,pl;q=0.9,en;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"DNT": "1",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Cache-Control": "max-age=0",
})
return session
def make_request(session: requests.Session, url: str, retries: int = 3) -> Optional[requests.Response]:
"""Make a GET request with retry logic and random delay"""
for attempt in range(retries):
try:
time.sleep(random.uniform(1, 3))
response = session.get(url, timeout=30)
response.raise_for_status()
return response
except requests.RequestException as e:
if attempt == retries - 1:
print(f"Failed after {retries} attempts for {url}: {e}")
return None
time.sleep(random.uniform(2, 5))
return NoneThe Accept-Language header is set to pl-PL because Allegro serves Polish content and expects Polish-speaking visitors. The User-Agent string matches a recent Chrome release. The retry helper adds random delays between requests, which helps avoid triggering rate limits.
How Do You Scrape Allegro Product Listings?
Let's start with the most common scraping target on Allegro, the category listing page. We will use the smartphone category as our example.
When you open a category page like https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165, you see a grid of product cards. Each card contains the product title, price, a link to the product page, the seller type, rating information, and a thumbnail image.
{
"alt": "Allegro smartphone category page showing product listing cards with titles, prices, ratings, and seller information",
"celltype": "image",
"height": 500,
"src": "./allegro-listings.webp",
"width": 900
}Allegro uses obfuscated CSS class names like mb54_5r and mgn2_14 that can change between site updates. This is important to understand because your selectors may need updating over time.
Here is the extraction function for parsing product cards from a category page.
def extract_product_listings(html: str) -> List[Dict]:
"""Parse product listing cards from an Allegro category page"""
soup = BeautifulSoup(html, "lxml")
listings = []
product_cards = soup.select("div[data-box-id] article")
if not product_cards:
product_cards = soup.select("div[data-box-id] section a[href*='/oferta/']")
for card in product_cards:
try:
title_el = card.select_one("h2") or card.select_one("h3")
title = title_el.get_text(strip=True) if title_el else None
link_el = card.select_one("a[href*='/oferta/']") or card.find("a", href=True)
link = link_el["href"] if link_el else None
if link and not link.startswith("http"):
link = "https://allegro.pl" + link
price_el = card.select_one("span[aria-label*='cena']") or card.select_one("span[class*='mli8']")
if not price_el:
for span in card.find_all("span"):
text = span.get_text(strip=True)
if re.search(r"\d+[.,]\d{2}\s*(zł|PLN)", text):
price_el = span
break
price = price_el.get_text(strip=True) if price_el else None
rating = None
review_count = None
rating_container = card.select_one("div[aria-label*='ocen']") or card.select_one("span[class*='m9qz']")
if rating_container:
rating_text = rating_container.get_text(strip=True)
rating_match = re.search(r"(\d+[.,]\d+)", rating_text)
if rating_match:
rating = rating_match.group(1)
count_match = re.search(r"\((\d+)\)", rating_text)
if count_match:
review_count = count_match.group(1)
seller_type = None
for span in card.find_all("span"):
text = span.get_text(strip=True)
if text in ("Business", "Private", "Firma", "Osoba prywatna"):
seller_type = text
break
img_el = card.select_one("img[src*='allegroimg']") or card.find("img")
image = img_el.get("src") if img_el else None
condition = None
for span in card.find_all("span"):
text = span.get_text(strip=True).lower()
if text in ("nowy", "używany", "new", "used"):
condition = span.get_text(strip=True)
break
if title:
listings.append({
"title": title,
"price": price,
"link": link,
"rating": rating,
"review_count": review_count,
"seller_type": seller_type,
"condition": condition,
"image": image,
})
except Exception:
continue
return listings
The function tries multiple selector strategies for each field. It starts with aria-label attributes and data- attributes when possible, because those tend to be more stable than obfuscated class names. For price detection, it falls back to regex matching against the Polish currency format.
Notice that we only extract fields that are reliably present on listing cards. Trying to pull every possible data point from a listing card leads to brittle code with dozens of fallback chains. Keep the listings scraper focused on the essentials, and use the product detail scraper for deeper data.
Paginating Through Results
Allegro handles pagination with a simple p query parameter in the URL. The first page loads without it, and every page after that just increments the number.
https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165 # page 1
https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165?p=2 # page 2
https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165?p=3 # page 3The scraper below loops through pages, collects listings from each one, and stops automatically if a page fails to load or returns no results. This prevents the scraper from spinning endlessly when it hits the last page.
def scrape_allegro_listings(category_url: str, max_pages: int = 5) -> List[Dict]:
"""Scrape product listings across multiple pages of an Allegro category"""
session = create_session()
all_listings = []
for page in range(1, max_pages + 1):
url = f"{category_url}?p={page}" if page > 1 else category_url
print(f"Scraping page {page}...")
response = make_request(session, url)
if not response:
print(f"Failed to fetch page {page}, stopping pagination.")
break
listings = extract_product_listings(response.text)
if not listings:
print(f"No listings found on page {page}, likely reached the last page.")
break
all_listings.extend(listings)
print(f"Collected {len(listings)} listings from page {page}")
print(f"\nTotal listings scraped: {len(all_listings)}")
return all_listings
# Example usage
if __name__ == "__main__":
url = "https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165"
results = scrape_allegro_listings(url, max_pages=3)
for i, product in enumerate(results[:5], 1):
print(f"\n{i}. {product['title']}")
print(f" Price: {product['price']}")
print(f" Rating: {product['rating']} ({product['review_count']} reviews)")
print(f" Seller: {product['seller_type']}")
print(f" Condition: {product['condition']}")Each page on Allegro returns around 60 product cards, so scraping 3 pages gives you roughly 180 listings. You can adjust max_pages based on how deep you need to go into the category.
Example Output
Scraping page 1...
Collected 60 listings from page 1
Scraping page 2...
Collected 60 listings from page 2
Scraping page 3...
Collected 60 listings from page 3
Total listings scraped: 180
-
Smartfon Motorola Edge 50 Neo 8 GB / 256 GB 5G szary
Price: 1 299,00 zł
Rating: 4,95 (42 reviews)
Seller: Business
Condition: Nowy
-
Smartfon Samsung Galaxy S24 FE 8 GB / 128 GB 5G czarny
Price: 2 299,00 zł
Rating: 4,87 (156 reviews)
Seller: Business
Condition: Nowy
-
Smartfon Xiaomi Redmi Note 13 Pro 8 GB / 256 GB 5G fioletowy
Price: 899,00 zł
Rating: 4,91 (312 reviews)
Seller: Business
Condition: Nowy
How Do You Scrape Allegro Product Detail Pages?
Product detail pages are where the real depth is on Allegro. A single product page gives you structured metadata, a full specifications table, seller reputation signals, variant options, and images. This is the data that matters for price monitoring and competitive analysis.
The good news is that Allegro embeds itemprop meta tags in every product page. These follow the Schema.org standard, and Allegro needs them for SEO, so they are far more stable than CSS class names. We will use those as our primary data source and only fall back to HTML parsing when the meta tags do not cover a field.
{
"alt": "Allegro product detail page showing the title, price, specifications table, and seller information sections",
"celltype": "image",
"height": 500,
"src": "./allegro-product.webp",
"width": 900
}Extracting Basic Product Information
The first function pulls the core product data. It reads itemprop meta tags for the title, price, SKU, GTIN, brand, availability, and condition. For the rating, it uses itemprop="ratingValue" and itemprop="ratingCount" instead of trying to parse the rating from the visible HTML.
def extract_basic_info(soup: BeautifulSoup) -> Dict:
"""Extract basic product information from meta tags and page structure"""
basic_info = {}
# Structured data from itemprop meta tags
meta_url = soup.find("meta", attrs={"itemprop": "url"})
meta_sku = soup.find("meta", attrs={"itemprop": "sku"})
meta_gtin = soup.find("meta", attrs={"itemprop": "gtin"})
meta_brand = soup.find("meta", attrs={"itemprop": "brand"})
# Offer-level structured data
offer_price = soup.find("meta", attrs={"itemprop": "price"})
offer_currency = soup.find("meta", attrs={"itemprop": "priceCurrency"})
offer_availability = soup.find("link", attrs={"itemprop": "availability"})
offer_condition = soup.find("meta", attrs={"itemprop": "itemCondition"})
# Product title from h1
title_elem = soup.find("h1")
basic_info["title"] = title_elem.get_text(strip=True) if title_elem else "N/A"
# Price from structured data first, HTML fallback second
if offer_price:
price_value = offer_price.get("content", "")
currency = offer_currency.get("content", "PLN") if offer_currency else "PLN"
basic_info["price"] = f"{price_value} {currency}"
else:
price_elem = soup.select_one("span[aria-label*='cena']")
basic_info["price"] = price_elem.get_text(strip=True) if price_elem else "N/A"
# Structured metadata
basic_info["sku"] = meta_sku.get("content", "N/A") if meta_sku else "N/A"
basic_info["gtin"] = meta_gtin.get("content", "N/A") if meta_gtin else "N/A"
basic_info["brand"] = meta_brand.get("content", "N/A") if meta_brand else "N/A"
basic_info["product_url"] = meta_url.get("content", "N/A") if meta_url else "N/A"
basic_info["availability"] = offer_availability.get("href", "N/A") if offer_availability else "N/A"
basic_info["condition"] = offer_condition.get("content", "N/A") if offer_condition else "N/A"
# Rating from aggregate rating meta tags
rating_value = soup.find("meta", attrs={"itemprop": "ratingValue"})
rating_count = soup.find("meta", attrs={"itemprop": "ratingCount"})
if not rating_count:
rating_count = soup.find("meta", attrs={"itemprop": "reviewCount"})
basic_info["rating"] = rating_value.get("content", "N/A") if rating_value else "N/A"
basic_info["ratings_count"] = rating_count.get("content", "N/A") if rating_count else "N/A"
# Product images
images = []
for img in soup.find_all("img"):
src = img.get("src", "")
if "allegroimg.com" in src and not src.startswith("data:"):
images.append(src)
basic_info["images"] = list(set(images))
return basic_infoAll the core fields come straight from meta tag content attributes, so there is no HTML class name that can break this. The only field that touches the visible DOM is the product title from the h1 element.
Extracting Specifications and Features
Allegro product pages have two separate areas with technical data. The specifications table near the top of the page holds structured key-value pairs like brand, model, condition, and EAN. The features section sits lower inside the seller's product description and usually lists hardware specs in bullet form.
One thing to watch out for in the specifications table is that some value cells contain hidden tooltip text.
def extract_specifications(soup: BeautifulSoup) -> Dict:
"""Extract the product specifications table"""
specifications = {}
specs_table = soup.find("table")
if specs_table:
for row in specs_table.find_all("tr"):
cells = row.find_all("td")
if len(cells) >= 2:
name = cells[0].get_text(strip=True)
# Get clean value, avoiding tooltip text in nested elements
value_cell = cells[1]
link = value_cell.find("a")
if link:
value = link.find(string=True, recursive=False)
value = value.strip() if value else link.get_text(strip=True)
else:
value = value_cell.find(string=True, recursive=False)
value = value.strip() if value else value_cell.get_text(strip=True)
if name and value:
specifications[name] = value
return specificationsThe features function scans the seller's description area for list items. These are typically the technical highlights that sellers add to their listings, things like processor model, RAM size, screen specs, and battery capacity.
def extract_features(soup: BeautifulSoup) -> List[str]:
"""Extract product features from the description section"""
features = []
description_sections = soup.find_all("div", class_=re.compile(r"_0d3bd"))
for section in description_sections:
for li in section.find_all("li"):
text = li.get_text(strip=True)
if text and len(text) > 10:
features.append(text)
return featuresThe _0d3bd class prefix is one of Allegro's obfuscated names. It could change in a future update, so keep an eye on it if your scraper stops finding features.
Extracting Seller Information
Allegro product pages also show seller and purchase signals that are useful for market analysis. This includes delivery promises, invoice availability, manufacturer codes, the Allegro Smart badge, and best price guarantee status.
def extract_seller_info(soup: BeautifulSoup) -> Dict:
"""Extract seller and purchase information from the product page"""
seller_info = {}
# Recent purchases
purchase_elem = soup.find("span", string=re.compile(r"\d+\s*(osób|people)\s*(kupiło|have)"))
seller_info["recent_purchases"] = purchase_elem.get_text(strip=True) if purchase_elem else "N/A"
# Invoice availability from the specs table
invoice_elem = soup.find("td", string=re.compile(r"Faktura|Invoice"))
if invoice_elem:
invoice_value = invoice_elem.find_next_sibling("td")
seller_info["invoice"] = invoice_value.get_text(strip=True) if invoice_value else "N/A"
else:
seller_info["invoice"] = "N/A"
# Manufacturer code from the specs table
code_elem = soup.find("td", string=re.compile(r"Kod producenta|Manufacturer code"))
if code_elem:
code_value = code_elem.find_next_sibling("td")
seller_info["manufacturer_code"] = code_value.get_text(strip=True) if code_value else "N/A"
else:
seller_info["manufacturer_code"] = "N/A"
# EAN/GTIN from the specs table
ean_elem = soup.find("td", string=re.compile(r"EAN"))
if ean_elem:
ean_value = ean_elem.find_next_sibling("td")
seller_info["ean"] = ean_value.get_text(strip=True) if ean_value else "N/A"
else:
seller_info["ean"] = "N/A"
# Delivery information
delivery_elem = soup.find("span", string=re.compile(r"(darmowa\s+)?dostawa"))
seller_info["delivery_info"] = delivery_elem.get_text(strip=True) if delivery_elem else "N/A"
# Installment information
installment_elem = soup.find("span", string=re.compile(r"x\s*\d+\s*rat"))
seller_info["installment_info"] = installment_elem.get_text(strip=True) if installment_elem else "N/A"
# Allegro Smart badge
smart_badge = soup.find("img", alt="Allegro Smart!")
seller_info["allegro_smart"] = "Yes" if smart_badge else "No"
# Best price guarantee
bpg_elem = soup.find("span", string=re.compile(r"Gwarancja najniższej ceny"))
seller_info["best_price_guarantee"] = "Yes" if bpg_elem else "No"
return seller_infoThe invoice, manufacturer code, and EAN fields come from the same specifications table. We look them up by matching the label text in Polish or English, then grab the value from the sibling cell.
Putting It All Together
Now we combine all four extraction functions into a single scraper. It fetches the product page, parses the HTML, and returns one flat dictionary with everything.
def scrape_product_details(url: str) -> Optional[Dict]:
"""Scrape comprehensive product data from an Allegro product page"""
session = create_session()
response = make_request(session, url)
if not response:
return None
soup = BeautifulSoup(response.content, "lxml")
return {
"url": url,
**extract_basic_info(soup),
"specifications": extract_specifications(soup),
"features": extract_features(soup),
"seller": extract_seller_info(soup),
}
# Example usage
if __name__ == "__main__":
url = "https://allegro.pl/oferta/smartfon-xiaomi-14t-pro-12-gb-512-gb-5g-niebieski-17386285003"
product = scrape_product_details(url)
if product:
print(f"Title: {product['title']}")
print(f"Price: {product['price']}")
print(f"Brand: {product['brand']}")
print(f"SKU: {product['sku']}")
print(f"GTIN: {product['gtin']}")
print(f"Condition: {product['condition']}")
print(f"Availability: {product['availability']}")
print(f"Rating: {product['rating']} ({product['ratings_count']} reviews)")
if product["specifications"]:
print("\nSpecifications:")
for key, value in product["specifications"].items():
print(f" {key}: {value}")
if product["features"]:
print("\nFeatures:")
for feat in product["features"][:10]:
print(f" - {feat}")
if product["seller"]:
seller = product["seller"]
print("\nSeller Info:")
print(f" Delivery: {seller['delivery_info']}")
print(f" Invoice: {seller['invoice']}")
print(f" Allegro Smart: {seller['allegro_smart']}")
print(f" Best Price Guarantee: {seller['best_price_guarantee']}")Example Output
Title: Smartfon Xiaomi 14T Pro 12 GB / 512 GB 5G niebieski
Price: 2300.00 PLN
Brand: Xiaomi
SKU: 18376191779
GTIN: 6941812789353
Condition: http://schema.org/NewCondition
Availability: http://schema.org/InStock
Rating: 4.76 (146 reviews)
Specifications:
Stan: Nowy
Faktura: Wystawiam fakturę VAT
Kod producenta: 6941812789353
Marka: Xiaomi
Model telefonu: 14T Pro
Typ: Smartfon
EAN (GTIN): 6941812789353
Kolor: niebieski
Features:
- Telefon komórkowy*1
- Wtyczka ładowania*1
- Służy do transmisji danych*1
Seller Info:
Delivery: darmowa dostawa
Invoice: Wystawiam fakturę VAT
Allegro Smart: No
Best Price Guarantee: Yes
The product detail scraper pulls everything from a single page in one pass. The structured itemprop meta tags handle the core fields like price, brand, SKU, and rating reliably. The specifications table gives you seller-provided attributes, and the seller info function picks up delivery and trust signals.
When Should You Use Scrapfly for Allegro Scraping?
The DIY approach above works well for small-scale scraping and for learning how Allegro pages are structured. But if you need to scrape Allegro reliably at any real volume, the anti-bot layer becomes the main bottleneck.
DataDome will eventually block your requests no matter how good your headers are. You will need residential Polish proxies, proper TLS fingerprinting, and potentially JavaScript rendering to keep scraping consistently. Managing all of that yourself takes real engineering effort.
Scrapfly handles the anti-bot infrastructure for you. It provides residential proxies with Polish geolocation, automatic DataDome bypass, and JavaScript rendering when needed. You send a request and you get clean HTML back.Here is what the same Allegro scraping looks like with Scrapfly.
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
# Scrape an Allegro category page with anti-bot bypass and Polish geolocation
result = client.scrape(ScrapeConfig(
url="https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165",
asp=True, # Anti Scraping Protection bypass
render_js=True, # Full JavaScript rendering
country="pl", # Polish geolocation
))
# Use the same parsing functions from earlier
html = result.scrape_result["content"]
listings = extract_product_listings(html)
print(f"Scraped {len(listings)} listings via Scrapfly")The asp=True flag enables anti-bot bypass, render_js=True handles JavaScript-rendered content, and country="pl" routes the request through Polish infrastructure. You can plug the same parsing functions from earlier in this guide right into the Scrapfly response.
For more on anti-bot strategies in general, see our guide on bypassing anti-bot protection.
FAQ
Does Allegro use Cloudflare or DataDome?
Allegro uses DataDome for its anti-bot protection, not Cloudflare. DataDome analyzes request fingerprints including TLS signatures, HTTP headers, and behavioral patterns. You can confirm this by checking the network requests in your browser's developer tools when visiting Allegro.
Do you need Polish proxies to scrape Allegro?
Polish proxies significantly improve your success rate. Allegro is a Polish marketplace and its anti-bot system treats non-Polish traffic with more suspicion. Residential Polish proxies work best because datacenter IPs are commonly flagged by DataDome.
Can you use the Allegro API instead of scraping?
Allegro offers a REST API for registered developers, but it requires OAuth authentication and has strict rate limits. The API is designed for sellers and integrators, not for large-scale market research. For most scraping use cases like price monitoring or competitive analysis, direct scraping gives you more flexibility and access to the full page content.
Conclusion
Scraping Allegro comes down to two things. Getting past DataDome to receive clean HTML, and then parsing the product data from that HTML using the structured metadata and page elements.
The DIY approach in this guide gives you a working scraper for both listings and product detail pages. If you need reliable, high-volume scraping without managing proxies and anti-bot infrastructure yourself, Scrapfly handles that layer so you can focus on the data.
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect:
- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens protected by GDPR.
- Do not repurpose entire public datasets which can be illegal in some countries.
Scrapfly does not offer legal advice but these are good general rules to follow. For more you should consult a lawyer.
