How to Scrape Allegro.pl Without Getting Blocked

How to Scrape Allegro.pl Without Getting Blocked How to Scrape Allegro.pl Without Getting Blocked

Allegro.pl is Poland's largest e-commerce marketplace with millions of product listings across every category. If you are doing price monitoring, market research, or competitive analysis in the Polish market, scraping Allegro is one of the most valuable data sources available.

The catch is that Allegro does not make it easy. The site runs DataDome anti-bot protection, and basic HTTP requests will get blocked fast on category and search pages. In this guide, we will walk through how to scrape both product listings and individual product pages using Python with requests and BeautifulSoup4, and how to handle the anti-bot challenges along the way.

Why Is Allegro.pl Hard to Scrape?

Allegro uses DataDome, one of the most common commercial anti-bot systems on the web today. If you send a plain requests.get() to an Allegro category page, you will almost certainly get a 403 response or a CAPTCHA challenge instead of the actual page content.

There are a few reasons scraping Allegro is harder than most e-commerce sites:

  • DataDome analyzes your request fingerprint. It checks your TLS signature, HTTP headers, and behavioral patterns. A simple Python request looks nothing like a real browser, and DataDome catches that immediately.

  • Allegro is a Polish marketplace, so requests coming from non-Polish IP addresses raise suspicion. If you are scraping from outside Poland without a Polish residential proxy, your success rate drops significantly.

  • listing and search pages are more aggressively protected than product detail pages. You might get lucky scraping a single product URL, but category pages with pagination are where most scrapers fail.

For a deeper look at how DataDome works and how to get around it, check out our dedicated guide.

What Do You Need to Scrape Allegro.pl with Python?

The whole setup runs on three Python packages. Install them first, then we will build a session that Allegro actually accepts.

bash
pip install requests beautifulsoup4 lxml

The requests library handles HTTP requests, beautifulsoup4 parses the HTML, and lxml gives BeautifulSoup a faster parsing backend.

The most important part of the setup is creating a session that looks like a real browser. Allegro expects Polish language headers, a modern User-Agent string, and standard browser request headers. Here is a simple session helper that covers the basics.

python
import requests
from bs4 import BeautifulSoup
import re
import time
import random
from typing import List, Dict, Optional

def create_session() -> requests.Session:
    """Create a requests session with browser-like headers"""
    session = requests.Session()
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "pl-PL,pl;q=0.9,en;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "DNT": "1",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
        "Cache-Control": "max-age=0",
    })
    return session

def make_request(session: requests.Session, url: str, retries: int = 3) -> Optional[requests.Response]:
    """Make a GET request with retry logic and random delay"""
    for attempt in range(retries):
        try:
            time.sleep(random.uniform(1, 3))
            response = session.get(url, timeout=30)
            response.raise_for_status()
            return response
        except requests.RequestException as e:
            if attempt == retries - 1:
                print(f"Failed after {retries} attempts for {url}: {e}")
                return None
            time.sleep(random.uniform(2, 5))
    return None

The Accept-Language header is set to pl-PL because Allegro serves Polish content and expects Polish-speaking visitors. The User-Agent string matches a recent Chrome release. The retry helper adds random delays between requests, which helps avoid triggering rate limits.

How Do You Scrape Allegro Product Listings?

Let's start with the most common scraping target on Allegro, the category listing page. We will use the smartphone category as our example.

When you open a category page like https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165, you see a grid of product cards. Each card contains the product title, price, a link to the product page, the seller type, rating information, and a thumbnail image.

json
{
  "alt": "Allegro smartphone category page showing product listing cards with titles, prices, ratings, and seller information",
  "celltype": "image",
  "height": 500,
  "src": "./allegro-listings.webp",
  "width": 900
}

Allegro uses obfuscated CSS class names like mb54_5r and mgn2_14 that can change between site updates. This is important to understand because your selectors may need updating over time.

Here is the extraction function for parsing product cards from a category page.

python
def extract_product_listings(html: str) -> List[Dict]:
    """Parse product listing cards from an Allegro category page"""
    soup = BeautifulSoup(html, "lxml")
    listings = []

    product_cards = soup.select("div[data-box-id] article")
    if not product_cards:
        product_cards = soup.select("div[data-box-id] section a[href*='/oferta/']")

    for card in product_cards:
        try:
            title_el = card.select_one("h2") or card.select_one("h3")
            title = title_el.get_text(strip=True) if title_el else None

            link_el = card.select_one("a[href*='/oferta/']") or card.find("a", href=True)
            link = link_el["href"] if link_el else None
            if link and not link.startswith("http"):
                link = "https://allegro.pl" + link

            price_el = card.select_one("span[aria-label*='cena']") or card.select_one("span[class*='mli8']")
            if not price_el:
                for span in card.find_all("span"):
                    text = span.get_text(strip=True)
                    if re.search(r"\d+[.,]\d{2}\s*(zł|PLN)", text):
                        price_el = span
                        break
            price = price_el.get_text(strip=True) if price_el else None

            rating = None
            review_count = None
            rating_container = card.select_one("div[aria-label*='ocen']") or card.select_one("span[class*='m9qz']")
            if rating_container:
                rating_text = rating_container.get_text(strip=True)
                rating_match = re.search(r"(\d+[.,]\d+)", rating_text)
                if rating_match:
                    rating = rating_match.group(1)
                count_match = re.search(r"\((\d+)\)", rating_text)
                if count_match:
                    review_count = count_match.group(1)

            seller_type = None
            for span in card.find_all("span"):
                text = span.get_text(strip=True)
                if text in ("Business", "Private", "Firma", "Osoba prywatna"):
                    seller_type = text
                    break

            img_el = card.select_one("img[src*='allegroimg']") or card.find("img")
            image = img_el.get("src") if img_el else None

            condition = None
            for span in card.find_all("span"):
                text = span.get_text(strip=True).lower()
                if text in ("nowy", "używany", "new", "used"):
                    condition = span.get_text(strip=True)
                    break

            if title:
                listings.append({
                    "title": title,
                    "price": price,
                    "link": link,
                    "rating": rating,
                    "review_count": review_count,
                    "seller_type": seller_type,
                    "condition": condition,
                    "image": image,
                })
        except Exception:
            continue

    return listings

The function tries multiple selector strategies for each field. It starts with aria-label attributes and data- attributes when possible, because those tend to be more stable than obfuscated class names. For price detection, it falls back to regex matching against the Polish currency format.

Notice that we only extract fields that are reliably present on listing cards. Trying to pull every possible data point from a listing card leads to brittle code with dozens of fallback chains. Keep the listings scraper focused on the essentials, and use the product detail scraper for deeper data.

Paginating Through Results

Allegro handles pagination with a simple p query parameter in the URL. The first page loads without it, and every page after that just increments the number.

https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165       # page 1
https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165?p=2   # page 2
https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165?p=3   # page 3

The scraper below loops through pages, collects listings from each one, and stops automatically if a page fails to load or returns no results. This prevents the scraper from spinning endlessly when it hits the last page.

python
def scrape_allegro_listings(category_url: str, max_pages: int = 5) -> List[Dict]:
    """Scrape product listings across multiple pages of an Allegro category"""
    session = create_session()
    all_listings = []

    for page in range(1, max_pages + 1):
        url = f"{category_url}?p={page}" if page > 1 else category_url
        print(f"Scraping page {page}...")

        response = make_request(session, url)
        if not response:
            print(f"Failed to fetch page {page}, stopping pagination.")
            break

        listings = extract_product_listings(response.text)
        if not listings:
            print(f"No listings found on page {page}, likely reached the last page.")
            break

        all_listings.extend(listings)
        print(f"Collected {len(listings)} listings from page {page}")

    print(f"\nTotal listings scraped: {len(all_listings)}")
    return all_listings

# Example usage
if __name__ == "__main__":
    url = "https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165"
    results = scrape_allegro_listings(url, max_pages=3)

    for i, product in enumerate(results[:5], 1):
        print(f"\n{i}. {product['title']}")
        print(f"   Price: {product['price']}")
        print(f"   Rating: {product['rating']} ({product['review_count']} reviews)")
        print(f"   Seller: {product['seller_type']}")
        print(f"   Condition: {product['condition']}")

Each page on Allegro returns around 60 product cards, so scraping 3 pages gives you roughly 180 listings. You can adjust max_pages based on how deep you need to go into the category.

Example Output

Scraping page 1...
Collected 60 listings from page 1
Scraping page 2...
Collected 60 listings from page 2
Scraping page 3...
Collected 60 listings from page 3

Total listings scraped: 180

  1. Smartfon Motorola Edge 50 Neo 8 GB / 256 GB 5G szary Price: 1 299,00 zł Rating: 4,95 (42 reviews) Seller: Business Condition: Nowy

  2. Smartfon Samsung Galaxy S24 FE 8 GB / 128 GB 5G czarny Price: 2 299,00 zł Rating: 4,87 (156 reviews) Seller: Business Condition: Nowy

  3. Smartfon Xiaomi Redmi Note 13 Pro 8 GB / 256 GB 5G fioletowy Price: 899,00 zł Rating: 4,91 (312 reviews) Seller: Business Condition: Nowy

How Do You Scrape Allegro Product Detail Pages?

Product detail pages are where the real depth is on Allegro. A single product page gives you structured metadata, a full specifications table, seller reputation signals, variant options, and images. This is the data that matters for price monitoring and competitive analysis.

The good news is that Allegro embeds itemprop meta tags in every product page. These follow the Schema.org standard, and Allegro needs them for SEO, so they are far more stable than CSS class names. We will use those as our primary data source and only fall back to HTML parsing when the meta tags do not cover a field.

json
{
  "alt": "Allegro product detail page showing the title, price, specifications table, and seller information sections",
  "celltype": "image",
  "height": 500,
  "src": "./allegro-product.webp",
  "width": 900
}

Extracting Basic Product Information

The first function pulls the core product data. It reads itemprop meta tags for the title, price, SKU, GTIN, brand, availability, and condition. For the rating, it uses itemprop="ratingValue" and itemprop="ratingCount" instead of trying to parse the rating from the visible HTML.

python
def extract_basic_info(soup: BeautifulSoup) -> Dict:
    """Extract basic product information from meta tags and page structure"""
    basic_info = {}

    # Structured data from itemprop meta tags
    meta_url = soup.find("meta", attrs={"itemprop": "url"})
    meta_sku = soup.find("meta", attrs={"itemprop": "sku"})
    meta_gtin = soup.find("meta", attrs={"itemprop": "gtin"})
    meta_brand = soup.find("meta", attrs={"itemprop": "brand"})

    # Offer-level structured data
    offer_price = soup.find("meta", attrs={"itemprop": "price"})
    offer_currency = soup.find("meta", attrs={"itemprop": "priceCurrency"})
    offer_availability = soup.find("link", attrs={"itemprop": "availability"})
    offer_condition = soup.find("meta", attrs={"itemprop": "itemCondition"})

    # Product title from h1
    title_elem = soup.find("h1")
    basic_info["title"] = title_elem.get_text(strip=True) if title_elem else "N/A"

    # Price from structured data first, HTML fallback second
    if offer_price:
        price_value = offer_price.get("content", "")
        currency = offer_currency.get("content", "PLN") if offer_currency else "PLN"
        basic_info["price"] = f"{price_value} {currency}"
    else:
        price_elem = soup.select_one("span[aria-label*='cena']")
        basic_info["price"] = price_elem.get_text(strip=True) if price_elem else "N/A"

    # Structured metadata
    basic_info["sku"] = meta_sku.get("content", "N/A") if meta_sku else "N/A"
    basic_info["gtin"] = meta_gtin.get("content", "N/A") if meta_gtin else "N/A"
    basic_info["brand"] = meta_brand.get("content", "N/A") if meta_brand else "N/A"
    basic_info["product_url"] = meta_url.get("content", "N/A") if meta_url else "N/A"
    basic_info["availability"] = offer_availability.get("href", "N/A") if offer_availability else "N/A"
    basic_info["condition"] = offer_condition.get("content", "N/A") if offer_condition else "N/A"

    # Rating from aggregate rating meta tags
    rating_value = soup.find("meta", attrs={"itemprop": "ratingValue"})
    rating_count = soup.find("meta", attrs={"itemprop": "ratingCount"})
    if not rating_count:
        rating_count = soup.find("meta", attrs={"itemprop": "reviewCount"})

    basic_info["rating"] = rating_value.get("content", "N/A") if rating_value else "N/A"
    basic_info["ratings_count"] = rating_count.get("content", "N/A") if rating_count else "N/A"

    # Product images
    images = []
    for img in soup.find_all("img"):
        src = img.get("src", "")
        if "allegroimg.com" in src and not src.startswith("data:"):
            images.append(src)
    basic_info["images"] = list(set(images))

    return basic_info

All the core fields come straight from meta tag content attributes, so there is no HTML class name that can break this. The only field that touches the visible DOM is the product title from the h1 element.

Extracting Specifications and Features

Allegro product pages have two separate areas with technical data. The specifications table near the top of the page holds structured key-value pairs like brand, model, condition, and EAN. The features section sits lower inside the seller's product description and usually lists hardware specs in bullet form.

One thing to watch out for in the specifications table is that some value cells contain hidden tooltip text.

python
def extract_specifications(soup: BeautifulSoup) -> Dict:
    """Extract the product specifications table"""
    specifications = {}

    specs_table = soup.find("table")
    if specs_table:
        for row in specs_table.find_all("tr"):
            cells = row.find_all("td")
            if len(cells) >= 2:
                name = cells[0].get_text(strip=True)
                # Get clean value, avoiding tooltip text in nested elements
                value_cell = cells[1]
                link = value_cell.find("a")
                if link:
                    value = link.find(string=True, recursive=False)
                    value = value.strip() if value else link.get_text(strip=True)
                else:
                    value = value_cell.find(string=True, recursive=False)
                    value = value.strip() if value else value_cell.get_text(strip=True)
                if name and value:
                    specifications[name] = value

    return specifications

The features function scans the seller's description area for list items. These are typically the technical highlights that sellers add to their listings, things like processor model, RAM size, screen specs, and battery capacity.

python
def extract_features(soup: BeautifulSoup) -> List[str]:
    """Extract product features from the description section"""
    features = []

    description_sections = soup.find_all("div", class_=re.compile(r"_0d3bd"))
    for section in description_sections:
        for li in section.find_all("li"):
            text = li.get_text(strip=True)
            if text and len(text) > 10:
                features.append(text)

    return features

The _0d3bd class prefix is one of Allegro's obfuscated names. It could change in a future update, so keep an eye on it if your scraper stops finding features.

Extracting Seller Information

Allegro product pages also show seller and purchase signals that are useful for market analysis. This includes delivery promises, invoice availability, manufacturer codes, the Allegro Smart badge, and best price guarantee status.

python
def extract_seller_info(soup: BeautifulSoup) -> Dict:
    """Extract seller and purchase information from the product page"""
    seller_info = {}

    # Recent purchases
    purchase_elem = soup.find("span", string=re.compile(r"\d+\s*(osób|people)\s*(kupiło|have)"))
    seller_info["recent_purchases"] = purchase_elem.get_text(strip=True) if purchase_elem else "N/A"

    # Invoice availability from the specs table
    invoice_elem = soup.find("td", string=re.compile(r"Faktura|Invoice"))
    if invoice_elem:
        invoice_value = invoice_elem.find_next_sibling("td")
        seller_info["invoice"] = invoice_value.get_text(strip=True) if invoice_value else "N/A"
    else:
        seller_info["invoice"] = "N/A"

    # Manufacturer code from the specs table
    code_elem = soup.find("td", string=re.compile(r"Kod producenta|Manufacturer code"))
    if code_elem:
        code_value = code_elem.find_next_sibling("td")
        seller_info["manufacturer_code"] = code_value.get_text(strip=True) if code_value else "N/A"
    else:
        seller_info["manufacturer_code"] = "N/A"

    # EAN/GTIN from the specs table
    ean_elem = soup.find("td", string=re.compile(r"EAN"))
    if ean_elem:
        ean_value = ean_elem.find_next_sibling("td")
        seller_info["ean"] = ean_value.get_text(strip=True) if ean_value else "N/A"
    else:
        seller_info["ean"] = "N/A"

    # Delivery information
    delivery_elem = soup.find("span", string=re.compile(r"(darmowa\s+)?dostawa"))
    seller_info["delivery_info"] = delivery_elem.get_text(strip=True) if delivery_elem else "N/A"

    # Installment information
    installment_elem = soup.find("span", string=re.compile(r"x\s*\d+\s*rat"))
    seller_info["installment_info"] = installment_elem.get_text(strip=True) if installment_elem else "N/A"

    # Allegro Smart badge
    smart_badge = soup.find("img", alt="Allegro Smart!")
    seller_info["allegro_smart"] = "Yes" if smart_badge else "No"

    # Best price guarantee
    bpg_elem = soup.find("span", string=re.compile(r"Gwarancja najniższej ceny"))
    seller_info["best_price_guarantee"] = "Yes" if bpg_elem else "No"

    return seller_info

The invoice, manufacturer code, and EAN fields come from the same specifications table. We look them up by matching the label text in Polish or English, then grab the value from the sibling cell.

Putting It All Together

Now we combine all four extraction functions into a single scraper. It fetches the product page, parses the HTML, and returns one flat dictionary with everything.

python
def scrape_product_details(url: str) -> Optional[Dict]:
    """Scrape comprehensive product data from an Allegro product page"""
    session = create_session()
    response = make_request(session, url)

    if not response:
        return None

    soup = BeautifulSoup(response.content, "lxml")

    return {
        "url": url,
        **extract_basic_info(soup),
        "specifications": extract_specifications(soup),
        "features": extract_features(soup),
        "seller": extract_seller_info(soup),
    }

# Example usage
if __name__ == "__main__":
    url = "https://allegro.pl/oferta/smartfon-xiaomi-14t-pro-12-gb-512-gb-5g-niebieski-17386285003"
    product = scrape_product_details(url)

    if product:
        print(f"Title: {product['title']}")
        print(f"Price: {product['price']}")
        print(f"Brand: {product['brand']}")
        print(f"SKU: {product['sku']}")
        print(f"GTIN: {product['gtin']}")
        print(f"Condition: {product['condition']}")
        print(f"Availability: {product['availability']}")
        print(f"Rating: {product['rating']} ({product['ratings_count']} reviews)")

        if product["specifications"]:
            print("\nSpecifications:")
            for key, value in product["specifications"].items():
                print(f"  {key}: {value}")

        if product["features"]:
            print("\nFeatures:")
            for feat in product["features"][:10]:
                print(f"  - {feat}")

        if product["seller"]:
            seller = product["seller"]
            print("\nSeller Info:")
            print(f"  Delivery: {seller['delivery_info']}")
            print(f"  Invoice: {seller['invoice']}")
            print(f"  Allegro Smart: {seller['allegro_smart']}")
            print(f"  Best Price Guarantee: {seller['best_price_guarantee']}")
Example Output

Title: Smartfon Xiaomi 14T Pro 12 GB / 512 GB 5G niebieski
Price: 2300.00 PLN
Brand: Xiaomi
SKU: 18376191779
GTIN: 6941812789353
Condition: http://schema.org/NewCondition
Availability: http://schema.org/InStock
Rating: 4.76 (146 reviews)

Specifications: Stan: Nowy Faktura: Wystawiam fakturę VAT Kod producenta: 6941812789353 Marka: Xiaomi Model telefonu: 14T Pro Typ: Smartfon EAN (GTIN): 6941812789353 Kolor: niebieski

Features:

  • Telefon komórkowy*1
  • Wtyczka ładowania*1
  • Służy do transmisji danych*1

Seller Info: Delivery: darmowa dostawa Invoice: Wystawiam fakturę VAT Allegro Smart: No Best Price Guarantee: Yes

The product detail scraper pulls everything from a single page in one pass. The structured itemprop meta tags handle the core fields like price, brand, SKU, and rating reliably. The specifications table gives you seller-provided attributes, and the seller info function picks up delivery and trust signals.

When Should You Use Scrapfly for Allegro Scraping?

The DIY approach above works well for small-scale scraping and for learning how Allegro pages are structured. But if you need to scrape Allegro reliably at any real volume, the anti-bot layer becomes the main bottleneck.

DataDome will eventually block your requests no matter how good your headers are. You will need residential Polish proxies, proper TLS fingerprinting, and potentially JavaScript rendering to keep scraping consistently. Managing all of that yourself takes real engineering effort.

Scrapfly handles the anti-bot infrastructure for you. It provides residential proxies with Polish geolocation, automatic DataDome bypass, and JavaScript rendering when needed. You send a request and you get clean HTML back.Here is what the same Allegro scraping looks like with Scrapfly.

python
from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")

# Scrape an Allegro category page with anti-bot bypass and Polish geolocation
result = client.scrape(ScrapeConfig(
    url="https://allegro.pl/kategoria/smartfony-i-telefony-komorkowe-165",
    asp=True,          # Anti Scraping Protection bypass
    render_js=True,    # Full JavaScript rendering
    country="pl",      # Polish geolocation
))

# Use the same parsing functions from earlier
html = result.scrape_result["content"]
listings = extract_product_listings(html)
print(f"Scraped {len(listings)} listings via Scrapfly")

The asp=True flag enables anti-bot bypass, render_js=True handles JavaScript-rendered content, and country="pl" routes the request through Polish infrastructure. You can plug the same parsing functions from earlier in this guide right into the Scrapfly response.

For more on anti-bot strategies in general, see our guide on bypassing anti-bot protection.

FAQ

Does Allegro use Cloudflare or DataDome?

Allegro uses DataDome for its anti-bot protection, not Cloudflare. DataDome analyzes request fingerprints including TLS signatures, HTTP headers, and behavioral patterns. You can confirm this by checking the network requests in your browser's developer tools when visiting Allegro.

Do you need Polish proxies to scrape Allegro?

Polish proxies significantly improve your success rate. Allegro is a Polish marketplace and its anti-bot system treats non-Polish traffic with more suspicion. Residential Polish proxies work best because datacenter IPs are commonly flagged by DataDome.

Can you use the Allegro API instead of scraping?

Allegro offers a REST API for registered developers, but it requires OAuth authentication and has strict rate limits. The API is designed for sellers and integrators, not for large-scale market research. For most scraping use cases like price monitoring or competitive analysis, direct scraping gives you more flexibility and access to the full page content.

Conclusion

Scraping Allegro comes down to two things. Getting past DataDome to receive clean HTML, and then parsing the product data from that HTML using the structured metadata and page elements.

The DIY approach in this guide gives you a working scraper for both listings and product detail pages. If you need reliable, high-volume scraping without managing proxies and anti-bot infrastructure yourself, Scrapfly handles that layer so you can focus on the data.

Scale Your Web Scraping
Anti-bot bypass, browser rendering, and rotating proxies — all in one API. Start with 1,000 free credits.
No credit card required 1,000 free API credits Anti-bot bypass included
Not ready? Get our newsletter instead.