
In this comprehensive guide, we'll explore how to scrape AutoScout24 effectively using Python. We'll cover the technical challenges, implement robust scraping solutions, and provide practical code examples for extracting automotive data at scale.
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens who are protected by GDPR.
- Do not repurpose the entire public datasets which can be illegal in some countries.
Why Scrape AutoScout24?
AutoScout24 serves as a critical data source for various business applications in the automotive industry. Car dealers can analyze pricing trends across different vehicle categories and markets, while manufacturers can monitor competitor pricing strategies. Additionally, market researchers can track vehicle availability and popularity across different body types and specifications.
The platform's extensive catalog includes detailed vehicle information, pricing data, technical specifications, and seller details, making it an ideal target for data-driven decision making in the automotive industry.
Understanding AutoScout24's Structure
Before diving into the scraping implementation, it's essential to understand AutoScout24's website architecture. The platform uses a modern JavaScript-based frontend that dynamically loads vehicle data, requiring careful handling of asynchronous content loading.
AutoScout24 employs robust anti-bot measures including IP tracking and JavaScript-rendered content, which makes traditional scraping approaches challenging. Understanding these defenses is crucial for developing effective scraping strategies.
Project Setup
To scrape AutoScout24 effectively, we'll use several Python libraries designed for modern web scraping:
- requests - HTTP library for making web requests
- BeautifulSoup - HTML parsing library
- json - For parsing JSON data embedded in pages
Install the required dependencies:
$ pip install requests beautifulsoup4
Example 1: Scraping Car Listings by Body Type
Our first example focuses on scraping car listings filtered by body type, specifically compact cars. This approach allows us to extract multiple vehicle listings from a category page, providing valuable market insights.
Setting Up the Listings Scraper
Let's start by setting up the basic structure and dependencies for our AutoScout24 listings scraper.
1. Prerequisites
First, install the required dependencies:
$ pip install requests beautifulsoup4
2. Basic Setup and User Agent Rotation
Create a file called scrape_autoscout24_listings.py
and start with the basic setup:
import requests
from bs4 import BeautifulSoup
import json
import re
import random
import time
# Simple list of user agents to rotate
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]
# Target URL for compact cars
url = "https://www.autoscout24.com/lst/c/compact"
# Create session with random user agent
session = requests.Session()
session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
})
3. Request Handling Function
This function handles the HTTP requests and validates that we can successfully access the target pages.
def make_request(url):
"""Make a request to the AutoScout24 listings page"""
try:
# Add random delay to avoid detection
time.sleep(random.uniform(1, 3))
response = session.get(url, timeout=15)
# Check if blocked
if response.status_code == 403:
print(" β Blocked (403 Forbidden)")
return None
# Check if successful
if response.status_code == 200:
print(" β
Successfully accessed page")
return response
else:
print(f" β Error: Status code {response.status_code}")
return None
except Exception as e:
print(f" β Error: {e}")
return None
4. Extracting Car Listings
This function extracts individual car listings from the page, including basic information like title, price, and link.
def extract_car_listings(soup):
"""Extract car listings from the search results page"""
listings = []
# Find all car listing containers
# AutoScout24 uses article tags with specific classes for car listings
car_articles = soup.find_all('article', class_='cldt-summary-full-item')
print(f" Found {len(car_articles)} car listings")
for article in car_articles:
try:
# Extract car title from the title link
title_link = article.find('a', class_='ListItem_title__ndA4s')
if title_link:
title_elem = title_link.find('h2')
if title_elem:
# Combine all span elements to get full title
title_spans = title_elem.find_all('span')
title = ' '.join([span.get_text().strip() for span in title_spans if span.get_text().strip()])
else:
title = title_link.get_text().strip()
else:
title = "N/A"
# Extract price from the price element
price_elem = article.find('p', class_='Price_price__APlgs')
price = price_elem.get_text().strip() if price_elem else "N/A"
# Extract mileage from the vehicle details table
mileage_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-mileage_road'})
mileage = mileage_elem.get_text().strip() if mileage_elem else "N/A"
# Extract registration year from the vehicle details table
year_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-calendar'})
year = year_elem.get_text().strip() if year_elem else "N/A"
# Extract fuel type from the vehicle details table
fuel_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-gas_pump'})
fuel_type = fuel_elem.get_text().strip() if fuel_elem else "N/A"
# Extract transmission from the vehicle details table
transmission_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-transmission'})
transmission = transmission_elem.get_text().strip() if transmission_elem else "N/A"
# Extract power from the vehicle details table
power_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-speedometer'})
power = power_elem.get_text().strip() if power_elem else "N/A"
# Extract link to detailed page
link_elem = article.find('a', class_='ListItem_title__ndA4s')
link = "https://www.autoscout24.com" + link_elem['href'] if link_elem else None
# Extract seller information
seller_name_elem = article.find('span', class_='SellerInfo_name__nR9JH')
seller_name = seller_name_elem.get_text().strip() if seller_name_elem else "N/A"
seller_address_elem = article.find('span', class_='SellerInfo_address__leRMu')
seller_address = seller_address_elem.get_text().strip() if seller_address_elem else "N/A"
listing_data = {
'title': title,
'price': price,
'mileage': mileage,
'year': year,
'fuel_type': fuel_type,
'transmission': transmission,
'power': power,
'seller_name': seller_name,
'seller_address': seller_address,
'link': link
}
listings.append(listing_data)
print(f" β’ {title} - {price} - {mileage} - {year} - {fuel_type}")
except Exception as e:
print(f" β Error extracting listing: {e}")
continue
return listings
5. Main Scraping Function
This function orchestrates the entire scraping process for the listings page.
def scrape_listings(url):
"""Main function to scrape car listings from AutoScout24"""
print(f"\nScraping listings from: {url}")
# Make request
response = make_request(url)
if not response:
return None
# Parse HTML
soup = BeautifulSoup(response.content, 'html.parser')
# Extract listings
listings = extract_car_listings(soup)
return listings
6. Main Execution
The main execution function manages the overall scraping workflow and handles the results.
def main():
"""Main execution function"""
print("π Starting AutoScout24 Compact Cars Scraper")
# Scrape listings
listings = scrape_listings(url)
if listings:
print(f"\nβ
Successfully scraped {len(listings)} car listings!")
return listings
else:
print("β Failed to scrape listings")
return None
# Run the scraper
if __name__ == "__main__":
main()
Example Output
π Starting AutoScout24 Compact Cars Scraper
Scraping listings from: https://www.autoscout24.com/lst/c/compact
β
Successfully accessed https://www.autoscout24.com/lst/c/compact
Found 19 car listings
β’ Peugeot 207 Filou MOTORSCHADEN!!!!! - β¬ 499 - 174,000 km - 01/2008 - Gasoline
β’ Renault Clio 1.2 RN - β¬ 999 - 142,875 km - 07/2000 - Gasoline
β’ Volkswagen Polo 1.4-16V Highline - β¬ 6,350 - 116,950 km - 10/2009 - Gasoline
β’ Peugeot 208 GTi - β¬ 4,990 - 111,846 km - 11/2013 - Gasoline
β’ Peugeot 208 1.2 VTi Active 1e Eigenaar,Airco,Cruise,PDC,Trekha - β¬ 4,449 - 124,752 km - 03/2014 - Gasoline
β’ Peugeot 207 1.4-16V Color-line - β¬ 1,249 - 228,423 km - 02/2008 - Gasoline
β’ Volkswagen Polo 1.0 Comfortline - β¬ 6,450 - 182,454 km - 10/2016 - Gasoline
β’ Fiat 500 1.2 Naked Panodak Clima Lmv Koopje! - β¬ 1,995 - 207,112 km - 01/2008 - Gasoline
⒠Renault Twingo 1.2 Privilège | Handelsauto | Recent nieuwe distri - ⬠1,250 - 75,562 km - 06/2005 - Gasoline
β’ Nissan Micra 1.2 - β¬ 980 - 190,582 km - 03/2004 - Gasoline
β’ Kia Picanto 1.0 CVVT ISG Comfort Pack 2e Eigenaar,Airco,Elektr - β¬ 4,749 - 93,864 km - 08/2013 - Gasoline
β’ Kia Picanto 1.0 CVVT Design Edition Airco 5-Deurs Origineel NL - β¬ 4,900 - 121,292 km - 01/2013 - Gasoline
β’ Ford Fiesta 1.6 Ghia 120PK,Stoelverwarming,Airco,ElektrischeRa - β¬ 4,749 - 153,420 km - 03/2009 - Gasoline
β’ Peugeot 107 1.0-12V XR | Airco | Toerenteller | 5drs | - β¬ 2,450 - 170,306 km - 10/2009 - Gasoline
β’ Volkswagen Golf R-line|Clima|Stoelverwarming|PDC - β¬ 6,950 - 154,629 km - 09/2012 - Gasoline
β’ Volkswagen Golf 1.2 TSI BlueMotion, airco, navi, bleutooth, APK 07 - β¬ 3,995 - 240,200 km - 01/2012 - Gasoline
β’ Volkswagen Polo 1.2 TSI BlueMotion Highline - β¬ 4,949 - 220,565 km - 01/2014 - Gasoline
β’ Ford Fiesta 1.25 Trend Trekhaak,Airco,Stoelverwarming,Elektris - β¬ 3,499 - 150,133 km - 09/2010 - Gasoline
β’ Fiat 500 0.9 TwinAir Lounge | Wit Parelmoer | PanoDak | Air - β¬ 4,950 - 96,486 km - 11/2011 - Gasoline
β
Successfully scraped 19 car listings!
Example 2: Scraping Individual Car Details
Our second example focuses on scraping detailed information from individual car pages. This approach allows us to extract comprehensive vehicle specifications, features, and seller information.
Setting Up the Individual Car Scraper
Let's create a scraper for individual car pages that extracts detailed vehicle information.
1. Prerequisites
The same dependencies as before:
$ pip install requests beautifulsoup4
2. Basic Setup for Individual Car Scraping
Create a file called scrape_autoscout24_car.py
and start with the basic setup:
import requests
from bs4 import BeautifulSoup
import json
import re
import random
import time
# Simple list of user agents to rotate
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]
# Target URL for individual car
url = "https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d"
# Create session with random user agent
session = requests.Session()
session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
})
3. Request Handling Function
This function handles the HTTP requests for individual car pages.
def make_request(url):
"""Make a request to the AutoScout24 car detail page"""
try:
# Add random delay to avoid detection
time.sleep(random.uniform(2, 4))
response = session.get(url, timeout=15)
# Check if blocked
if response.status_code == 403:
print(" β Blocked (403 Forbidden)")
return None
# Check if successful
if response.status_code == 200:
print(" β
Successfully accessed page")
return response
else:
print(f" β Error: Status code {response.status_code}")
return None
except Exception as e:
print(f" β Error: {e}")
return None
4. Extracting Basic Car Information
This function extracts the basic car information including title, price, and key details.
def extract_basic_info(soup):
"""Extract basic car information from the detail page"""
car_data = {}
# Extract car title from the stage title
title_elem = soup.find('h1', class_='StageTitle_title__ROiR4')
if title_elem:
# Get the make and model from the bold classified info
make_model_elem = title_elem.find('span', class_='StageTitle_boldClassifiedInfo__sQb0l')
model_version_elem = title_elem.find('div', class_='StageTitle_modelVersion__Yof2Z')
if make_model_elem and model_version_elem:
car_data['title'] = f"{make_model_elem.get_text().strip()} {model_version_elem.get_text().strip()}"
elif make_model_elem:
car_data['title'] = make_model_elem.get_text().strip()
else:
car_data['title'] = title_elem.get_text().strip()
print(f" Car: {car_data['title']}")
else:
car_data['title'] = "Not found"
print(" Car: Not found")
# Extract price from the price section
price_elem = soup.find('span', class_='PriceInfo_price__XU0aF')
if price_elem:
car_data['price'] = price_elem.get_text().strip()
print(f" Price: {car_data['price']}")
else:
car_data['price'] = "Not found"
print(" Price: Not found")
# Extract mileage from the vehicle overview
mileage_elem = soup.find('div', class_='VehicleOverview_itemContainer__XSLWi')
if mileage_elem:
# Find the mileage item by looking for the mileage icon and text
mileage_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
for item in mileage_items:
title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
if title_elem and 'Mileage' in title_elem.get_text():
text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
if text_elem:
car_data['mileage'] = text_elem.get_text().strip()
print(f" Mileage: {car_data['mileage']}")
break
else:
car_data['mileage'] = "Not found"
print(" Mileage: Not found")
else:
car_data['mileage'] = "Not found"
print(" Mileage: Not found")
# Extract registration year from the vehicle overview
registration_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
for item in registration_items:
title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
if title_elem and 'First registration' in title_elem.get_text():
text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
if text_elem:
car_data['year'] = text_elem.get_text().strip()
print(f" Year: {car_data['year']}")
break
else:
car_data['year'] = "Not found"
print(" Year: Not found")
return car_data
5. Extracting Technical Specifications
This function extracts detailed technical specifications from the car's specification section.
def extract_specifications(soup):
"""Extract technical specifications from the car detail page"""
specifications = {}
# Extract specifications from the vehicle overview section
overview_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
for item in overview_items:
title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
if title_elem and text_elem:
title = title_elem.get_text().strip()
value = text_elem.get_text().strip()
if 'Fuel type' in title:
specifications['fuel_type'] = value
print(f" Fuel Type: {value}")
elif 'Gearbox' in title:
specifications['transmission'] = value
print(f" Transmission: {value}")
elif 'Power' in title:
specifications['power'] = value
print(f" Power: {value}")
# Extract additional specifications from the technical data section
tech_section = soup.find('section', attrs={'data-cy': 'technical-details-section'})
if tech_section:
# Find all dt/dd pairs in the technical data
dt_elements = tech_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
dd_elements = tech_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
for dt, dd in zip(dt_elements, dd_elements):
title = dt.get_text().strip()
value = dd.get_text().strip()
if 'Engine size' in title:
specifications['engine_size'] = value
print(f" Engine Size: {value}")
elif 'Cylinders' in title:
specifications['cylinders'] = value
print(f" Cylinders: {value}")
elif 'Power' in title and 'power' not in specifications:
specifications['power'] = value
print(f" Power: {value}")
elif 'Gearbox' in title and 'transmission' not in specifications:
specifications['transmission'] = value
print(f" Transmission: {value}")
# Extract color information from the color section
color_section = soup.find('section', attrs={'data-cy': 'color-section'})
if color_section:
dt_elements = color_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
dd_elements = color_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
for dt, dd in zip(dt_elements, dd_elements):
title = dt.get_text().strip()
value = dd.get_text().strip()
if 'Manufacturer colour' in title:
specifications['color'] = value
print(f" Color: {value}")
elif 'Paint' in title:
specifications['paint_type'] = value
print(f" Paint Type: {value}")
return specifications
6. Extracting Features and Equipment
This function extracts the car's features and equipment list.
def extract_features(soup):
"""Extract car features and equipment from the detail page"""
features = []
# Find equipment section
equipment_section = soup.find('section', attrs={'data-cy': 'equipment-section'})
if equipment_section:
# Find all dt/dd pairs in the equipment section
dt_elements = equipment_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
dd_elements = equipment_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
for dt, dd in zip(dt_elements, dd_elements):
category = dt.get_text().strip()
# Find all li elements in the dd
feature_items = dd.find_all('li')
if feature_items:
print(f" {category}:")
for item in feature_items:
feature_text = item.get_text().strip()
if feature_text:
features.append(f"{category}: {feature_text}")
print(f" β’ {feature_text}")
if not features:
print(" Features: Not found")
return features
7. Extracting Seller Information
This function extracts seller details and contact information.
def extract_seller_info(soup):
"""Extract seller information from the car detail page"""
seller_data = {}
# Extract seller type from the vehicle overview
overview_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
for item in overview_items:
title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
if title_elem and text_elem and 'Seller' in title_elem.get_text():
seller_data['type'] = text_elem.get_text().strip()
print(f" Seller Type: {seller_data['type']}")
break
# Extract location from the location link
location_link = soup.find('a', class_='LocationWithPin_locationItem__tK1m5')
if location_link:
seller_data['location'] = location_link.get_text().strip()
print(f" Location: {seller_data['location']}")
else:
seller_data['location'] = "Not found"
print(" Location: Not found")
# Extract seller description from the seller notes section
seller_notes_section = soup.find('section', attrs={'data-cy': 'seller-notes-section'})
if seller_notes_section:
content_div = seller_notes_section.find('div', class_='SellerNotesSection_content__te2EB')
if content_div:
seller_data['description'] = content_div.get_text().strip()
print(f" Description: {seller_data['description'][:100]}...")
else:
seller_data['description'] = "Not found"
print(" Description: Not found")
else:
seller_data['description'] = "Not found"
print(" Description: Not found")
return seller_data
8. Main Scraping Function
This function combines all the individual extraction methods into a comprehensive scraper for individual car pages.
def scrape_car_details(url):
"""Main function to scrape detailed information from a single car page"""
print(f"\nScraping car details from: {url}")
# Make request
response = make_request(url)
if not response:
return None
# Parse HTML
soup = BeautifulSoup(response.content, 'html.parser')
# Extract all data
basic_info = extract_basic_info(soup)
specifications = extract_specifications(soup)
features = extract_features(soup)
seller_info = extract_seller_info(soup)
# Combine all data
result = {
'url': url,
**basic_info,
'specifications': specifications,
'features': features,
'seller': seller_info
}
return result
9. Main Execution
The main execution function manages the overall scraping workflow for individual car pages.
def main():
"""Main execution function"""
print("π Starting AutoScout24 Individual Car Scraper")
# Scrape car details
car_data = scrape_car_details(url)
if car_data:
print(f"\nβ
Successfully scraped car details!")
# Save results to file
with open('autoscout24_car_details.json', 'w') as f:
json.dump(car_data, f, indent=2)
print("πΎ Results saved to autoscout24_car_details.json")
return car_data
else:
print("β Failed to scrape car details")
return None
# Run the scraper
if __name__ == "__main__":
main()
Example Output
π Starting AutoScout24 Individual Car Scraper
Scraping car details from: https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d
β
Successfully accessed https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d
Car: Peugeot 207 Filou MOTORSCHADEN!!!!!
Price: β¬ 499
Mileage: 174,000 km
Year: 01/2008
Transmission: Manual
Fuel Type: Gasoline
Power: 70 kW (95 hp)
Engine Size: 1,397 cc
Cylinders: 4
Color: BLEU NEYSHA
Paint Type: Metallic
Comfort & Convenience:
β’ Power windows
Safety & Security:
β’ ABS
β’ Central door lock
β’ Driver-side airbag
β’ Passenger-side airbag
β’ Power steering
β’ Side airbag
Extras:
β’ Alloy wheels
Seller Type: Dealer
Location: Berlin
Description: Sonderausstattung:MOTOR DREHT NICHT!!!!!!Metallic-Lackierung, ALUFELGEN, u.s.w.Weitere Ausstattung:A...
β
Successfully scraped car details!
Handling Anti-Bot Protection
AutoScout24 employs sophisticated anti-bot measures including IP tracking and JavaScript-rendered content. Let's explore different approaches to handle these challenges.
1. User Agent Rotation
The scraper randomly selects from a pool of realistic user agents to mimic different browsers. This helps avoid detection by making requests appear to come from various browsers.
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]
session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5"
})
2. Session Management
Using a requests session maintains cookies and connection pooling, making requests appear more natural. This approach helps maintain consistency across multiple requests.
session = requests.Session()
session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
})
3. Rate Limiting
Add delays between requests to avoid overwhelming the server. This helps prevent detection and ensures respectful scraping practices.
import time
for url in urls:
# Add random delay between requests
time.sleep(random.uniform(1, 3))
# ... scraping code ...
For more advanced anti-blocking techniques, check out our comprehensive guide on
5 Tools to Scrape Without Blocking and How it All Works
Tutorial on how to avoid web scraper blocking. What is javascript and TLS (JA3) fingerprinting and what role request headers play in blocking.
which covers TLS fingerprinting, IP rotation, and other detection methods.
Advanced Scraping Techniques
For more robust scraping, consider these additional techniques. These methods help improve reliability and scalability for production environments.
1. Proxy Rotation
For large-scale scraping, use rotating proxies. This technique helps distribute requests across multiple IP addresses to avoid blocking.
proxies = {
'http': 'http://proxy1:port',
'https': 'https://proxy1:port'
}
response = session.get(url, proxies=proxies, timeout=15)
2. Data Storage and Analysis
Save scraped data to files for analysis. This allows you to process and analyze the collected data efficiently.
import json
import csv
def save_data_json(data, filename):
"""Save data to JSON file"""
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
def save_data_csv(data, filename):
"""Save data to CSV file"""
if data and len(data) > 0:
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
# Collect data
scraped_data = []
for url in urls:
# ... scraping code ...
car_data = {
'title': title,
'price': price,
'mileage': mileage,
'year': year,
'location': location
}
scraped_data.append(car_data)
3. Error Handling and Retry Logic
Implement robust error handling with retry logic for better reliability.
import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def create_session_with_retries():
"""Create a session with retry logic"""
session = requests.Session()
# Configure retry strategy
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
For more advanced data processing and analysis techniques, see our guide on
How to Observe E-Commerce Trends using Web Scraping
In this example web scraping project we'll be taking a look at monitoring E-Commerce trends using Python, web scraping and data visualization tools.
Scraping with Scrapfly
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
- Anti-bot protection bypass - extract web pages without blocking!
- Rotating residential proxies - prevent IP address and geographic blocks.
- LLM prompts - extract data or ask questions using LLMs
- Extraction models - automatically find objects like products, articles, jobs, and more.
- Extraction templates - extract data using your own specification.
- Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.
For reliable and scalable AutoScout24 scraping, consider using Scrapfly's web scraping API. Scrapfly handles anti-bot measures, provides rotating proxies, and ensures high success rates for data extraction.
Here's how to use Scrapfly for scraping AutoScout24:
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
scrapfly = ScrapflyClient(key="YOUR-SCRAPFLY-KEY")
# Scrape car listings
result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
tags=["autoscout24", "car-listings"],
format="json",
asp=True,
render_js=True,
url="https://www.autoscout24.com/lst/c/compact"
))
print(result)
# Scrape individual car details
car_result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
tags=["autoscout24", "car-details"],
format="json",
asp=True,
render_js=True,
url="https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d"
))
print(car_result)
Best Practices and Tips
When scraping AutoScout24, follow these best practices. These guidelines help ensure successful and ethical web scraping operations.
- Respect robots.txt: Always check and follow the website's robots.txt file
- Implement delays: Use random delays between requests to avoid detection
- Handle errors gracefully: Implement proper error handling for network issues
- Monitor success rates: Track scraping success rates and adjust strategies accordingly
- Use proxies: Consider using rotating proxies for large-scale scraping
- Validate data: Always validate extracted data for completeness and accuracy
- Respect rate limits: Don't overwhelm the server with too many requests
- Update selectors: Regularly check and update CSS selectors as the site evolves
For more comprehensive web scraping best practices, see our
Everything to Know to Start Web Scraping in Python Today
Complete introduction to web scraping using Python: http, parsing, AI, scaling and deployment.
Related E-commerce Scraping Guides
If you're interested in scraping other automotive or e-commerce platforms, check out these related guides. These resources provide additional techniques and approaches for different types of websites.
- Comprehensive guide to scraping Amazon product data
How to Scrape Amazon.com Product Data and Reviews
This scrape guide covers the biggest e-commerce platform in US - Amazon.com. We'll take a look how to scrape product data and reviews in Python, as well as some common challenges, tips and tricks.
- Guide to extracting eBay listings and product information
How to Scrape Ebay Using Python (2025 Update)
In this scrape guide we'll be taking a look at Ebay.com - the biggest peer-to-peer e-commerce portal in the world. We'll be scraping product details and product search.
- Techniques for scraping Walmart product pages
How to Scrape Walmart.com Product Data (2025 Update)
Tutorial on how to scrape walmart.com product and review data using Python. How to avoid blocking to web scrape data at scale and other tips.
- Extracting product and review data from Etsy
How to Scrape Etsy.com Product, Shop and Search Data
In this scrapeguide we're taking a look at Etsy.com - a popular e-commerce market for hand crafted and vintage items. We'll be using Python and HTML parsing to scrape search and product data.
FAQ
Now let's answer some frequently asked questions.
What are the main challenges when scraping AutoScout24?
AutoScout24 uses sophisticated anti-bot protection including IP tracking and JavaScript-rendered content, which can block automated requests. The main challenges include 403 Forbidden errors, IP-based blocking, and dynamic content loading which can make traditional scraping approaches unreliable. The site also uses complex CSS selectors that may change over time.
What data can I extract from individual AutoScout24 car pages?
You can extract detailed car information including specifications (engine, fuel type, transmission, power, color), features and equipment lists, seller information (name, location, phone), and comprehensive vehicle details. The site provides rich data for automotive market research and analysis.
What I can do to avoid getting blocked?
You can use Scrapfly's web scraping API to avoid getting blocked. Scrapfly provides residential proxies and automatic bot detection bypass.
Summary
This comprehensive guide covered the essential techniques for scraping AutoScout24 effectively. We explored the website's structure, implemented two working scraping solutions using requests and BeautifulSoup, and discussed anti-blocking strategies. The provided code examples demonstrate how to extract car listings by body type and detailed individual car information.
The simple approach using requests and BeautifulSoup provides a good balance of reliability and ease of use, while the anti-blocking techniques help avoid detection. For production use, consider implementing additional features like rate limiting, proxy rotation, and data storage.