πŸš€ We are hiring! See open positions
How to Scrape AutoScout24
AutoScout24 is Europe's largest online car marketplace, offering millions of vehicle listings with comprehensive data including prices, specifications, features, and seller information. With detailed car information spanning from compact city cars to luxury vehicles, AutoScout24 is a valuable target for web scraping projects focused on automotive market research, price monitoring, and vehicle analysis.

In this comprehensive guide, we'll explore how to scrape AutoScout24 effectively using Python. We'll cover the technical challenges, implement robust scraping solutions, and provide practical code examples for extracting automotive data at scale.

Why Scrape AutoScout24?

AutoScout24 serves as a critical data source for various business applications in the automotive industry. Car dealers can analyze pricing trends across different vehicle categories and markets, while manufacturers can monitor competitor pricing strategies. Additionally, market researchers can track vehicle availability and popularity across different body types and specifications.

The platform's extensive catalog includes detailed vehicle information, pricing data, technical specifications, and seller details, making it an ideal target for data-driven decision making in the automotive industry.

Understanding AutoScout24's Structure

Before diving into the scraping implementation, it's essential to understand AutoScout24's website architecture. The platform uses a modern JavaScript-based frontend that dynamically loads vehicle data, requiring careful handling of asynchronous content loading.

AutoScout24 employs robust anti-bot measures including IP tracking and JavaScript-rendered content, which makes traditional scraping approaches challenging. Understanding these defenses is crucial for developing effective scraping strategies.

Project Setup

To scrape AutoScout24 effectively, we'll use several Python libraries designed for modern web scraping:

  • requests - HTTP library for making web requests
  • BeautifulSoup - HTML parsing library
  • json - For parsing JSON data embedded in pages

Install the required dependencies:

$ pip install requests beautifulsoup4

Example 1: Scraping Car Listings by Body Type

Our first example focuses on scraping car listings filtered by body type, specifically compact cars. This approach allows us to extract multiple vehicle listings from a category page, providing valuable market insights.

Setting Up the Listings Scraper

Let's start by setting up the basic structure and dependencies for our AutoScout24 listings scraper.

1. Prerequisites

First, install the required dependencies:

$ pip install requests beautifulsoup4

2. Basic Setup and User Agent Rotation

Create a file called scrape_autoscout24_listings.py and start with the basic setup:

import requests
from bs4 import BeautifulSoup
import json
import re
import random
import time

# Simple list of user agents to rotate
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]

# Target URL for compact cars
url = "https://www.autoscout24.com/lst/c/compact"

# Create session with random user agent
session = requests.Session()
session.headers.update({
    "User-Agent": random.choice(user_agents),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1"
})

3. Request Handling Function

This function handles the HTTP requests and validates that we can successfully access the target pages.

def make_request(url):
    """Make a request to the AutoScout24 listings page"""
    try:
        # Add random delay to avoid detection
        time.sleep(random.uniform(1, 3))
        
        response = session.get(url, timeout=15)
        
        # Check if blocked
        if response.status_code == 403:
            print("  ❌ Blocked (403 Forbidden)")
            return None
        
        # Check if successful
        if response.status_code == 200:
            print("  βœ… Successfully accessed page")
            return response
        else:
            print(f"  ❌ Error: Status code {response.status_code}")
            return None
            
    except Exception as e:
        print(f"  ❌ Error: {e}")
        return None

4. Extracting Car Listings

This function extracts individual car listings from the page, including basic information like title, price, and link.

def extract_car_listings(soup):
    """Extract car listings from the search results page"""
    listings = []
    
    # Find all car listing containers
    # AutoScout24 uses article tags with specific classes for car listings
    car_articles = soup.find_all('article', class_='cldt-summary-full-item')
    
    print(f"  Found {len(car_articles)} car listings")
    
    for article in car_articles:
        try:
            # Extract car title from the title link
            title_link = article.find('a', class_='ListItem_title__ndA4s')
            if title_link:
                title_elem = title_link.find('h2')
                if title_elem:
                    # Combine all span elements to get full title
                    title_spans = title_elem.find_all('span')
                    title = ' '.join([span.get_text().strip() for span in title_spans if span.get_text().strip()])
                else:
                    title = title_link.get_text().strip()
            else:
                title = "N/A"
            
            # Extract price from the price element
            price_elem = article.find('p', class_='Price_price__APlgs')
            price = price_elem.get_text().strip() if price_elem else "N/A"
            
            # Extract mileage from the vehicle details table
            mileage_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-mileage_road'})
            mileage = mileage_elem.get_text().strip() if mileage_elem else "N/A"
            
            # Extract registration year from the vehicle details table
            year_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-calendar'})
            year = year_elem.get_text().strip() if year_elem else "N/A"
            
            # Extract fuel type from the vehicle details table
            fuel_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-gas_pump'})
            fuel_type = fuel_elem.get_text().strip() if fuel_elem else "N/A"
            
            # Extract transmission from the vehicle details table
            transmission_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-transmission'})
            transmission = transmission_elem.get_text().strip() if transmission_elem else "N/A"
            
            # Extract power from the vehicle details table
            power_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-speedometer'})
            power = power_elem.get_text().strip() if power_elem else "N/A"
            
            # Extract link to detailed page
            link_elem = article.find('a', class_='ListItem_title__ndA4s')
            link = "https://www.autoscout24.com" + link_elem['href'] if link_elem else None
            
            # Extract seller information
            seller_name_elem = article.find('span', class_='SellerInfo_name__nR9JH')
            seller_name = seller_name_elem.get_text().strip() if seller_name_elem else "N/A"
            
            seller_address_elem = article.find('span', class_='SellerInfo_address__leRMu')
            seller_address = seller_address_elem.get_text().strip() if seller_address_elem else "N/A"
            
            listing_data = {
                'title': title,
                'price': price,
                'mileage': mileage,
                'year': year,
                'fuel_type': fuel_type,
                'transmission': transmission,
                'power': power,
                'seller_name': seller_name,
                'seller_address': seller_address,
                'link': link
            }
            
            listings.append(listing_data)
            
            print(f"    β€’ {title} - {price} - {mileage} - {year} - {fuel_type}")
            
        except Exception as e:
            print(f"    ❌ Error extracting listing: {e}")
            continue
    
    return listings

5. Main Scraping Function

This function orchestrates the entire scraping process for the listings page.

def scrape_listings(url):
    """Main function to scrape car listings from AutoScout24"""
    print(f"\nScraping listings from: {url}")
    
    # Make request
    response = make_request(url)
    if not response:
        return None
    
    # Parse HTML
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Extract listings
    listings = extract_car_listings(soup)
    
    return listings

6. Main Execution

The main execution function manages the overall scraping workflow and handles the results.

def main():
    """Main execution function"""
    print("πŸš— Starting AutoScout24 Compact Cars Scraper")
    
    # Scrape listings
    listings = scrape_listings(url)
    
    if listings:
        print(f"\nβœ… Successfully scraped {len(listings)} car listings!")
        
        return listings
    else:
        print("❌ Failed to scrape listings")
        return None

# Run the scraper
if __name__ == "__main__":
    main()
Example Output

πŸš— Starting AutoScout24 Compact Cars Scraper

Scraping listings from: https://www.autoscout24.com/lst/c/compact
βœ… Successfully accessed https://www.autoscout24.com/lst/c/compact
Found 19 car listings
β€’ Peugeot 207 Filou MOTORSCHADEN!!!!! - € 499 - 174,000 km - 01/2008 - Gasoline
β€’ Renault Clio 1.2 RN - € 999 - 142,875 km - 07/2000 - Gasoline
β€’ Volkswagen Polo 1.4-16V Highline - € 6,350 - 116,950 km - 10/2009 - Gasoline
β€’ Peugeot 208 GTi - € 4,990 - 111,846 km - 11/2013 - Gasoline
β€’ Peugeot 208 1.2 VTi Active 1e Eigenaar,Airco,Cruise,PDC,Trekha - € 4,449 - 124,752 km - 03/2014 - Gasoline
β€’ Peugeot 207 1.4-16V Color-line - € 1,249 - 228,423 km - 02/2008 - Gasoline
β€’ Volkswagen Polo 1.0 Comfortline - € 6,450 - 182,454 km - 10/2016 - Gasoline
β€’ Fiat 500 1.2 Naked Panodak Clima Lmv Koopje! - € 1,995 - 207,112 km - 01/2008 - Gasoline
β€’ Renault Twingo 1.2 PrivilΓ¨ge | Handelsauto | Recent nieuwe distri - € 1,250 - 75,562 km - 06/2005 - Gasoline
β€’ Nissan Micra 1.2 - € 980 - 190,582 km - 03/2004 - Gasoline
β€’ Kia Picanto 1.0 CVVT ISG Comfort Pack 2e Eigenaar,Airco,Elektr - € 4,749 - 93,864 km - 08/2013 - Gasoline
β€’ Kia Picanto 1.0 CVVT Design Edition Airco 5-Deurs Origineel NL - € 4,900 - 121,292 km - 01/2013 - Gasoline
β€’ Ford Fiesta 1.6 Ghia 120PK,Stoelverwarming,Airco,ElektrischeRa - € 4,749 - 153,420 km - 03/2009 - Gasoline
β€’ Peugeot 107 1.0-12V XR | Airco | Toerenteller | 5drs | - € 2,450 - 170,306 km - 10/2009 - Gasoline
β€’ Volkswagen Golf R-line|Clima|Stoelverwarming|PDC - € 6,950 - 154,629 km - 09/2012 - Gasoline
β€’ Volkswagen Golf 1.2 TSI BlueMotion, airco, navi, bleutooth, APK 07 - € 3,995 - 240,200 km - 01/2012 - Gasoline
β€’ Volkswagen Polo 1.2 TSI BlueMotion Highline - € 4,949 - 220,565 km - 01/2014 - Gasoline
β€’ Ford Fiesta 1.25 Trend Trekhaak,Airco,Stoelverwarming,Elektris - € 3,499 - 150,133 km - 09/2010 - Gasoline
β€’ Fiat 500 0.9 TwinAir Lounge | Wit Parelmoer | PanoDak | Air - € 4,950 - 96,486 km - 11/2011 - Gasoline

βœ… Successfully scraped 19 car listings!

Example 2: Scraping Individual Car Details

Our second example focuses on scraping detailed information from individual car pages. This approach allows us to extract comprehensive vehicle specifications, features, and seller information.

Setting Up the Individual Car Scraper

Let's create a scraper for individual car pages that extracts detailed vehicle information.

1. Prerequisites

The same dependencies as before:

$ pip install requests beautifulsoup4

2. Basic Setup for Individual Car Scraping

Create a file called scrape_autoscout24_car.py and start with the basic setup:

import requests
from bs4 import BeautifulSoup
import json
import re
import random
import time

# Simple list of user agents to rotate
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]

# Target URL for individual car
url = "https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d"

# Create session with random user agent
session = requests.Session()
session.headers.update({
    "User-Agent": random.choice(user_agents),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1"
})

3. Request Handling Function

This function handles the HTTP requests for individual car pages.

def make_request(url):
    """Make a request to the AutoScout24 car detail page"""
    try:
        # Add random delay to avoid detection
        time.sleep(random.uniform(2, 4))
        
        response = session.get(url, timeout=15)
        
        # Check if blocked
        if response.status_code == 403:
            print("  ❌ Blocked (403 Forbidden)")
            return None
        
        # Check if successful
        if response.status_code == 200:
            print("  βœ… Successfully accessed page")
            return response
        else:
            print(f"  ❌ Error: Status code {response.status_code}")
            return None
            
    except Exception as e:
        print(f"  ❌ Error: {e}")
        return None

4. Extracting Basic Car Information

This function extracts the basic car information including title, price, and key details.

def extract_basic_info(soup):
    """Extract basic car information from the detail page"""
    car_data = {}
    
    # Extract car title from the stage title
    title_elem = soup.find('h1', class_='StageTitle_title__ROiR4')
    if title_elem:
        # Get the make and model from the bold classified info
        make_model_elem = title_elem.find('span', class_='StageTitle_boldClassifiedInfo__sQb0l')
        model_version_elem = title_elem.find('div', class_='StageTitle_modelVersion__Yof2Z')
        
        if make_model_elem and model_version_elem:
            car_data['title'] = f"{make_model_elem.get_text().strip()} {model_version_elem.get_text().strip()}"
        elif make_model_elem:
            car_data['title'] = make_model_elem.get_text().strip()
        else:
            car_data['title'] = title_elem.get_text().strip()
        
        print(f"  Car: {car_data['title']}")
    else:
        car_data['title'] = "Not found"
        print("  Car: Not found")
    
    # Extract price from the price section
    price_elem = soup.find('span', class_='PriceInfo_price__XU0aF')
    if price_elem:
        car_data['price'] = price_elem.get_text().strip()
        print(f"  Price: {car_data['price']}")
    else:
        car_data['price'] = "Not found"
        print("  Price: Not found")
    
    # Extract mileage from the vehicle overview
    mileage_elem = soup.find('div', class_='VehicleOverview_itemContainer__XSLWi')
    if mileage_elem:
        # Find the mileage item by looking for the mileage icon and text
        mileage_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
        for item in mileage_items:
            title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
            if title_elem and 'Mileage' in title_elem.get_text():
                text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
                if text_elem:
                    car_data['mileage'] = text_elem.get_text().strip()
                    print(f"  Mileage: {car_data['mileage']}")
                    break
        else:
            car_data['mileage'] = "Not found"
            print("  Mileage: Not found")
    else:
        car_data['mileage'] = "Not found"
        print("  Mileage: Not found")
    
    # Extract registration year from the vehicle overview
    registration_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
    for item in registration_items:
        title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
        if title_elem and 'First registration' in title_elem.get_text():
            text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
            if text_elem:
                car_data['year'] = text_elem.get_text().strip()
                print(f"  Year: {car_data['year']}")
                break
    else:
        car_data['year'] = "Not found"
        print("  Year: Not found")
    
    return car_data

5. Extracting Technical Specifications

This function extracts detailed technical specifications from the car's specification section.

def extract_specifications(soup):
    """Extract technical specifications from the car detail page"""
    specifications = {}
    
    # Extract specifications from the vehicle overview section
    overview_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
    
    for item in overview_items:
        title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
        text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
        
        if title_elem and text_elem:
            title = title_elem.get_text().strip()
            value = text_elem.get_text().strip()
            
            if 'Fuel type' in title:
                specifications['fuel_type'] = value
                print(f"  Fuel Type: {value}")
            elif 'Gearbox' in title:
                specifications['transmission'] = value
                print(f"  Transmission: {value}")
            elif 'Power' in title:
                specifications['power'] = value
                print(f"  Power: {value}")
    
    # Extract additional specifications from the technical data section
    tech_section = soup.find('section', attrs={'data-cy': 'technical-details-section'})
    if tech_section:
        # Find all dt/dd pairs in the technical data
        dt_elements = tech_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
        dd_elements = tech_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
        
        for dt, dd in zip(dt_elements, dd_elements):
            title = dt.get_text().strip()
            value = dd.get_text().strip()
            
            if 'Engine size' in title:
                specifications['engine_size'] = value
                print(f"  Engine Size: {value}")
            elif 'Cylinders' in title:
                specifications['cylinders'] = value
                print(f"  Cylinders: {value}")
            elif 'Power' in title and 'power' not in specifications:
                specifications['power'] = value
                print(f"  Power: {value}")
            elif 'Gearbox' in title and 'transmission' not in specifications:
                specifications['transmission'] = value
                print(f"  Transmission: {value}")
    
    # Extract color information from the color section
    color_section = soup.find('section', attrs={'data-cy': 'color-section'})
    if color_section:
        dt_elements = color_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
        dd_elements = color_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
        
        for dt, dd in zip(dt_elements, dd_elements):
            title = dt.get_text().strip()
            value = dd.get_text().strip()
            
            if 'Manufacturer colour' in title:
                specifications['color'] = value
                print(f"  Color: {value}")
            elif 'Paint' in title:
                specifications['paint_type'] = value
                print(f"  Paint Type: {value}")
    
    return specifications

6. Extracting Features and Equipment

This function extracts the car's features and equipment list.

def extract_features(soup):
    """Extract car features and equipment from the detail page"""
    features = []
    
    # Find equipment section
    equipment_section = soup.find('section', attrs={'data-cy': 'equipment-section'})
    if equipment_section:
        # Find all dt/dd pairs in the equipment section
        dt_elements = equipment_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
        dd_elements = equipment_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
        
        for dt, dd in zip(dt_elements, dd_elements):
            category = dt.get_text().strip()
            # Find all li elements in the dd
            feature_items = dd.find_all('li')
            
            if feature_items:
                print(f"    {category}:")
                for item in feature_items:
                    feature_text = item.get_text().strip()
                    if feature_text:
                        features.append(f"{category}: {feature_text}")
                        print(f"      β€’ {feature_text}")
    
    if not features:
        print("  Features: Not found")
    
    return features

7. Extracting Seller Information

This function extracts seller details and contact information.

def extract_seller_info(soup):
    """Extract seller information from the car detail page"""
    seller_data = {}
    
    # Extract seller type from the vehicle overview
    overview_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
    for item in overview_items:
        title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
        text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
        
        if title_elem and text_elem and 'Seller' in title_elem.get_text():
            seller_data['type'] = text_elem.get_text().strip()
            print(f"  Seller Type: {seller_data['type']}")
            break
    
    # Extract location from the location link
    location_link = soup.find('a', class_='LocationWithPin_locationItem__tK1m5')
    if location_link:
        seller_data['location'] = location_link.get_text().strip()
        print(f"  Location: {seller_data['location']}")
    else:
        seller_data['location'] = "Not found"
        print("  Location: Not found")
    
    # Extract seller description from the seller notes section
    seller_notes_section = soup.find('section', attrs={'data-cy': 'seller-notes-section'})
    if seller_notes_section:
        content_div = seller_notes_section.find('div', class_='SellerNotesSection_content__te2EB')
        if content_div:
            seller_data['description'] = content_div.get_text().strip()
            print(f"  Description: {seller_data['description'][:100]}...")
        else:
            seller_data['description'] = "Not found"
            print("  Description: Not found")
    else:
        seller_data['description'] = "Not found"
        print("  Description: Not found")
    
    return seller_data

8. Main Scraping Function

This function combines all the individual extraction methods into a comprehensive scraper for individual car pages.

def scrape_car_details(url):
    """Main function to scrape detailed information from a single car page"""
    print(f"\nScraping car details from: {url}")
    
    # Make request
    response = make_request(url)
    if not response:
        return None
    
    # Parse HTML
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Extract all data
    basic_info = extract_basic_info(soup)
    specifications = extract_specifications(soup)
    features = extract_features(soup)
    seller_info = extract_seller_info(soup)
    
    # Combine all data
    result = {
        'url': url,
        **basic_info,
        'specifications': specifications,
        'features': features,
        'seller': seller_info
    }
    
    return result

9. Main Execution

The main execution function manages the overall scraping workflow for individual car pages.

def main():
    """Main execution function"""
    print("πŸš— Starting AutoScout24 Individual Car Scraper")
    
    # Scrape car details
    car_data = scrape_car_details(url)
    
    if car_data:
        print(f"\nβœ… Successfully scraped car details!")
        
        # Save results to file
        with open('autoscout24_car_details.json', 'w') as f:
            json.dump(car_data, f, indent=2)
        print("πŸ’Ύ Results saved to autoscout24_car_details.json")
        
        return car_data
    else:
        print("❌ Failed to scrape car details")
        return None

# Run the scraper
if __name__ == "__main__":
    main()
Example Output

πŸš— Starting AutoScout24 Individual Car Scraper

Scraping car details from: https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d
βœ… Successfully accessed https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d
Car: Peugeot 207 Filou MOTORSCHADEN!!!!!
Price: € 499
Mileage: 174,000 km
Year: 01/2008
Transmission: Manual
Fuel Type: Gasoline
Power: 70 kW (95 hp)
Engine Size: 1,397 cc
Cylinders: 4
Color: BLEU NEYSHA
Paint Type: Metallic
Comfort & Convenience:
β€’ Power windows
Safety & Security:
β€’ ABS
β€’ Central door lock
β€’ Driver-side airbag
β€’ Passenger-side airbag
β€’ Power steering
β€’ Side airbag
Extras:
β€’ Alloy wheels
Seller Type: Dealer
Location: Berlin
Description: Sonderausstattung:MOTOR DREHT NICHT!!!!!!Metallic-Lackierung, ALUFELGEN, u.s.w.Weitere Ausstattung:A...

βœ… Successfully scraped car details!

Handling Anti-Bot Protection

AutoScout24 employs sophisticated anti-bot measures including IP tracking and JavaScript-rendered content. Let's explore different approaches to handle these challenges.

1. User Agent Rotation

The scraper randomly selects from a pool of realistic user agents to mimic different browsers. This helps avoid detection by making requests appear to come from various browsers.

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]

session.headers.update({
    "User-Agent": random.choice(user_agents),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5"
})

2. Session Management

Using a requests session maintains cookies and connection pooling, making requests appear more natural. This approach helps maintain consistency across multiple requests.

session = requests.Session()
session.headers.update({
    "User-Agent": random.choice(user_agents),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1"
})

3. Rate Limiting

Add delays between requests to avoid overwhelming the server. This helps prevent detection and ensures respectful scraping practices.

import time

for url in urls:
    # Add random delay between requests
    time.sleep(random.uniform(1, 3))
    
    # ... scraping code ...

For more advanced anti-blocking techniques, check out our comprehensive guide on

5 Tools to Scrape Without Blocking and How it All Works

Tutorial on how to avoid web scraper blocking. What is javascript and TLS (JA3) fingerprinting and what role request headers play in blocking.

5 Tools to Scrape Without Blocking and How it All Works

which covers TLS fingerprinting, IP rotation, and other detection methods.

Advanced Scraping Techniques

For more robust scraping, consider these additional techniques. These methods help improve reliability and scalability for production environments.

1. Proxy Rotation

For large-scale scraping, use rotating proxies. This technique helps distribute requests across multiple IP addresses to avoid blocking.

proxies = {
    'http': 'http://proxy1:port',
    'https': 'https://proxy1:port'
}

response = session.get(url, proxies=proxies, timeout=15)

2. Data Storage and Analysis

Save scraped data to files for analysis. This allows you to process and analyze the collected data efficiently.

import json
import csv

def save_data_json(data, filename):
    """Save data to JSON file"""
    with open(filename, 'w') as f:
        json.dump(data, f, indent=2)

def save_data_csv(data, filename):
    """Save data to CSV file"""
    if data and len(data) > 0:
        with open(filename, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=data[0].keys())
            writer.writeheader()
            writer.writerows(data)

# Collect data
scraped_data = []
for url in urls:
    # ... scraping code ...
    car_data = {
        'title': title,
        'price': price,
        'mileage': mileage,
        'year': year,
        'location': location
    }
    scraped_data.append(car_data)

3. Error Handling and Retry Logic

Implement robust error handling with retry logic for better reliability.

import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

def create_session_with_retries():
    """Create a session with retry logic"""
    session = requests.Session()
    
    # Configure retry strategy
    retry_strategy = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
    )
    
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    
    return session

For more advanced data processing and analysis techniques, see our guide on

In this example web scraping project we'll be taking a look at monitoring E-Commerce trends using Python, web scraping and data visualization tools.

How to Observe E-Commerce Trends using Web Scraping

Scraping with Scrapfly

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.


scrapfly middleware

For reliable and scalable AutoScout24 scraping, consider using Scrapfly's web scraping API. Scrapfly handles anti-bot measures, provides rotating proxies, and ensures high success rates for data extraction.

Here's how to use Scrapfly for scraping AutoScout24:

from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR-SCRAPFLY-KEY")

# Scrape car listings
result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
    tags=["autoscout24", "car-listings"],
    format="json",
    asp=True,
    render_js=True,
    url="https://www.autoscout24.com/lst/c/compact"
))

print(result)

# Scrape individual car details
car_result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
    tags=["autoscout24", "car-details"],
    format="json",
    asp=True,
    render_js=True,
    url="https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d"
))

print(car_result)

Best Practices and Tips

When scraping AutoScout24, follow these best practices. These guidelines help ensure successful and ethical web scraping operations.

  1. Respect robots.txt: Always check and follow the website's robots.txt file
  2. Implement delays: Use random delays between requests to avoid detection
  3. Handle errors gracefully: Implement proper error handling for network issues
  4. Monitor success rates: Track scraping success rates and adjust strategies accordingly
  5. Use proxies: Consider using rotating proxies for large-scale scraping
  6. Validate data: Always validate extracted data for completeness and accuracy
  7. Respect rate limits: Don't overwhelm the server with too many requests
  8. Update selectors: Regularly check and update CSS selectors as the site evolves

For more comprehensive web scraping best practices, see our

Everything to Know to Start Web Scraping in Python Today

Complete introduction to web scraping using Python: http, parsing, AI, scaling and deployment.

Everything to Know to Start Web Scraping in Python Today

If you're interested in scraping other automotive or e-commerce platforms, check out these related guides. These resources provide additional techniques and approaches for different types of websites.

  • Comprehensive guide to scraping Amazon product data

How to Scrape Amazon.com Product Data and Reviews

This scrape guide covers the biggest e-commerce platform in US - Amazon.com. We'll take a look how to scrape product data and reviews in Python, as well as some common challenges, tips and tricks.

How to Scrape Amazon.com Product Data and Reviews
  • Guide to extracting eBay listings and product information

How to Scrape Ebay Using Python (2025 Update)

In this scrape guide we'll be taking a look at Ebay.com - the biggest peer-to-peer e-commerce portal in the world. We'll be scraping product details and product search.

How to Scrape Ebay Using Python (2025 Update)
  • Techniques for scraping Walmart product pages

How to Scrape Walmart.com Product Data (2025 Update)

Tutorial on how to scrape walmart.com product and review data using Python. How to avoid blocking to web scrape data at scale and other tips.

How to Scrape Walmart.com Product Data (2025 Update)
  • Extracting product and review data from Etsy

How to Scrape Etsy.com Product, Shop and Search Data

In this scrapeguide we're taking a look at Etsy.com - a popular e-commerce market for hand crafted and vintage items. We'll be using Python and HTML parsing to scrape search and product data.

How to Scrape Etsy.com Product, Shop and Search Data

FAQ

Now let's answer some frequently asked questions.

What are the main challenges when scraping AutoScout24?

AutoScout24 uses sophisticated anti-bot protection including IP tracking and JavaScript-rendered content, which can block automated requests. The main challenges include 403 Forbidden errors, IP-based blocking, and dynamic content loading which can make traditional scraping approaches unreliable. The site also uses complex CSS selectors that may change over time.

What data can I extract from individual AutoScout24 car pages?

You can extract detailed car information including specifications (engine, fuel type, transmission, power, color), features and equipment lists, seller information (name, location, phone), and comprehensive vehicle details. The site provides rich data for automotive market research and analysis.

What I can do to avoid getting blocked?

You can use Scrapfly's web scraping API to avoid getting blocked. Scrapfly provides residential proxies and automatic bot detection bypass.

Summary

This comprehensive guide covered the essential techniques for scraping AutoScout24 effectively. We explored the website's structure, implemented two working scraping solutions using requests and BeautifulSoup, and discussed anti-blocking strategies. The provided code examples demonstrate how to extract car listings by body type and detailed individual car information.

The simple approach using requests and BeautifulSoup provides a good balance of reliability and ease of use, while the anti-blocking techniques help avoid detection. For production use, consider implementing additional features like rate limiting, proxy rotation, and data storage.

Explore this Article with AI

Related Knowledgebase

Related Articles