
This guide shows how to scrape AutoScout24 with Python. We'll keep it practical: what works, what can break, and code you can run right away.
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens who are protected by GDPR.
- Do not repurpose the entire public datasets which can be illegal in some countries.
Why Scrape AutoScout24?
Car dealers watch prices and inventory across markets. Makers and resellers track competitors. Researchers follow availability by body type and trim. AutoScout24 exposes a lot of the info you need: title, price, mileage, year, specs, and seller details.
Understanding AutoScout24's Structure
AutoScout24 is a modern JS site, so a lot of content loads after the first HTML. It also has strong bot protection. Expect some 403s and changing selectors, and plan for that.
Project Setup
We'll use a few Python libraries:
- requests - HTTP library for making web requests
- BeautifulSoup - HTML parsing library
- json - For parsing JSON data embedded in pages
Install the required dependencies:
$ pip install requests beautifulsoup4
Example 1: Scraping Car Listings by Body Type
First, we'll scrape compact car listings from a category page and pull the basics (title, price, mileage, year, etc.).
Setting Up the Listings Scraper
Set up a simple listings scraper.
1. Prerequisites
First, install the required dependencies:
$ pip install requests beautifulsoup4
2. Basic Setup and User Agent Rotation
Create a file called scrape_autoscout24_listings.py
and start with the basic setup:
import requests
from bs4 import BeautifulSoup
import json
import re
import random
import time
# Simple list of user agents to rotate
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]
# Target URL for compact cars
url = "https://www.autoscout24.com/lst/c/compact"
# Create session with random user agent
session = requests.Session()
session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
})
3. Request Handling Function
This function makes the request and checks if the page is reachable.
def make_request(url):
"""Make a request to the AutoScout24 listings page"""
try:
# Add random delay to avoid detection
time.sleep(random.uniform(1, 3))
response = session.get(url, timeout=15)
# Check if blocked
if response.status_code == 403:
print(" β Blocked (403 Forbidden)")
return None
# Check if successful
if response.status_code == 200:
print(" β
Successfully accessed page")
return response
else:
print(f" β Error: Status code {response.status_code}")
return None
except Exception as e:
print(f" β Error: {e}")
return None
4. Extracting Car Listings
This pulls individual car listings from the page: title, price, link, and key details.
def extract_car_listings(soup):
"""Extract car listings from the search results page"""
listings = []
# Find all car listing containers
# AutoScout24 uses article tags with specific classes for car listings
car_articles = soup.find_all('article', class_='cldt-summary-full-item')
print(f" Found {len(car_articles)} car listings")
for article in car_articles:
try:
# Extract car title from the title link
title_link = article.find('a', class_='ListItem_title__ndA4s')
if title_link:
title_elem = title_link.find('h2')
if title_elem:
# Combine all span elements to get full title
title_spans = title_elem.find_all('span')
title = ' '.join([span.get_text().strip() for span in title_spans if span.get_text().strip()])
else:
title = title_link.get_text().strip()
else:
title = "N/A"
# Extract price from the price element
price_elem = article.find('p', class_='Price_price__APlgs')
price = price_elem.get_text().strip() if price_elem else "N/A"
# Extract mileage from the vehicle details table
mileage_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-mileage_road'})
mileage = mileage_elem.get_text().strip() if mileage_elem else "N/A"
# Extract registration year from the vehicle details table
year_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-calendar'})
year = year_elem.get_text().strip() if year_elem else "N/A"
# Extract fuel type from the vehicle details table
fuel_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-gas_pump'})
fuel_type = fuel_elem.get_text().strip() if fuel_elem else "N/A"
# Extract transmission from the vehicle details table
transmission_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-transmission'})
transmission = transmission_elem.get_text().strip() if transmission_elem else "N/A"
# Extract power from the vehicle details table
power_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-speedometer'})
power = power_elem.get_text().strip() if power_elem else "N/A"
# Extract link to detailed page
link_elem = article.find('a', class_='ListItem_title__ndA4s')
link = "https://www.autoscout24.com" + link_elem['href'] if link_elem else None
# Extract seller information
seller_name_elem = article.find('span', class_='SellerInfo_name__nR9JH')
seller_name = seller_name_elem.get_text().strip() if seller_name_elem else "N/A"
seller_address_elem = article.find('span', class_='SellerInfo_address__leRMu')
seller_address = seller_address_elem.get_text().strip() if seller_address_elem else "N/A"
listing_data = {
'title': title,
'price': price,
'mileage': mileage,
'year': year,
'fuel_type': fuel_type,
'transmission': transmission,
'power': power,
'seller_name': seller_name,
'seller_address': seller_address,
'link': link
}
listings.append(listing_data)
print(f" β’ {title} - {price} - {mileage} - {year} - {fuel_type}")
except Exception as e:
print(f" β Error extracting listing: {e}")
continue
return listings
5. Main Scraping Function
This ties the request, parsing, and extraction together for the listings page.
def scrape_listings(url):
"""Main function to scrape car listings from AutoScout24"""
print(f"\nScraping listings from: {url}")
# Make request
response = make_request(url)
if not response:
return None
# Parse HTML
soup = BeautifulSoup(response.content, 'html.parser')
# Extract listings
listings = extract_car_listings(soup)
return listings
6. Main Execution
The main execution function manages the overall scraping workflow and handles the results.
def main():
"""Main execution function"""
print("π Starting AutoScout24 Compact Cars Scraper")
# Scrape listings
listings = scrape_listings(url)
if listings:
print(f"\nβ
Successfully scraped {len(listings)} car listings!")
return listings
else:
print("β Failed to scrape listings")
return None
# Run the scraper
if __name__ == "__main__":
main()
Example Output
π Starting AutoScout24 Compact Cars Scraper
Scraping listings from: https://www.autoscout24.com/lst/c/compact
β
Successfully accessed https://www.autoscout24.com/lst/c/compact
Found 19 car listings
β’ Peugeot 207 Filou MOTORSCHADEN!!!!! - β¬ 499 - 174,000 km - 01/2008 - Gasoline
β’ Renault Clio 1.2 RN - β¬ 999 - 142,875 km - 07/2000 - Gasoline
β’ Volkswagen Polo 1.4-16V Highline - β¬ 6,350 - 116,950 km - 10/2009 - Gasoline
β’ Peugeot 208 GTi - β¬ 4,990 - 111,846 km - 11/2013 - Gasoline
β’ Peugeot 208 1.2 VTi Active 1e Eigenaar,Airco,Cruise,PDC,Trekha - β¬ 4,449 - 124,752 km - 03/2014 - Gasoline
β’ Peugeot 207 1.4-16V Color-line - β¬ 1,249 - 228,423 km - 02/2008 - Gasoline
β’ Volkswagen Polo 1.0 Comfortline - β¬ 6,450 - 182,454 km - 10/2016 - Gasoline
β’ Fiat 500 1.2 Naked Panodak Clima Lmv Koopje! - β¬ 1,995 - 207,112 km - 01/2008 - Gasoline
⒠Renault Twingo 1.2 Privilège | Handelsauto | Recent nieuwe distri - ⬠1,250 - 75,562 km - 06/2005 - Gasoline
β’ Nissan Micra 1.2 - β¬ 980 - 190,582 km - 03/2004 - Gasoline
β’ Kia Picanto 1.0 CVVT ISG Comfort Pack 2e Eigenaar,Airco,Elektr - β¬ 4,749 - 93,864 km - 08/2013 - Gasoline
β’ Kia Picanto 1.0 CVVT Design Edition Airco 5-Deurs Origineel NL - β¬ 4,900 - 121,292 km - 01/2013 - Gasoline
β’ Ford Fiesta 1.6 Ghia 120PK,Stoelverwarming,Airco,ElektrischeRa - β¬ 4,749 - 153,420 km - 03/2009 - Gasoline
β’ Peugeot 107 1.0-12V XR | Airco | Toerenteller | 5drs | - β¬ 2,450 - 170,306 km - 10/2009 - Gasoline
β’ Volkswagen Golf R-line|Clima|Stoelverwarming|PDC - β¬ 6,950 - 154,629 km - 09/2012 - Gasoline
β’ Volkswagen Golf 1.2 TSI BlueMotion, airco, navi, bleutooth, APK 07 - β¬ 3,995 - 240,200 km - 01/2012 - Gasoline
β’ Volkswagen Polo 1.2 TSI BlueMotion Highline - β¬ 4,949 - 220,565 km - 01/2014 - Gasoline
β’ Ford Fiesta 1.25 Trend Trekhaak,Airco,Stoelverwarming,Elektris - β¬ 3,499 - 150,133 km - 09/2010 - Gasoline
β’ Fiat 500 0.9 TwinAir Lounge | Wit Parelmoer | PanoDak | Air - β¬ 4,950 - 96,486 km - 11/2011 - Gasoline
β
Successfully scraped 19 car listings!
Example 2: Scraping Individual Car Details
Next, we'll scrape a single car page to get detailed specs, features, and seller info.
Setting Up the Individual Car Scraper
We'll create a small scraper for individual car pages to extract detailed vehicle information.
1. Prerequisites
The same dependencies as before:
$ pip install requests beautifulsoup4
2. Basic Setup for Individual Car Scraping
Create a file called scrape_autoscout24_car.py
and start with the basic setup:
import requests
from bs4 import BeautifulSoup
import json
import re
import random
import time
# Simple list of user agents to rotate
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]
# Target URL for individual car
url = "https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d"
# Create session with random user agent
session = requests.Session()
session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
})
3. Request Handling Function
This function requests an individual car page and checks for blocks.
def make_request(url):
"""Make a request to the AutoScout24 car detail page"""
try:
# Add random delay to avoid detection
time.sleep(random.uniform(2, 4))
response = session.get(url, timeout=15)
# Check if blocked
if response.status_code == 403:
print(" β Blocked (403 Forbidden)")
return None
# Check if successful
if response.status_code == 200:
print(" β
Successfully accessed page")
return response
else:
print(f" β Error: Status code {response.status_code}")
return None
except Exception as e:
print(f" β Error: {e}")
return None
4. Extracting Basic Car Information
This reads the title, price, and a few basics.
def extract_basic_info(soup):
"""Extract basic car information from the detail page"""
car_data = {}
# Extract car title from the stage title
title_elem = soup.find('h1', class_='StageTitle_title__ROiR4')
if title_elem:
# Get the make and model from the bold classified info
make_model_elem = title_elem.find('span', class_='StageTitle_boldClassifiedInfo__sQb0l')
model_version_elem = title_elem.find('div', class_='StageTitle_modelVersion__Yof2Z')
if make_model_elem and model_version_elem:
car_data['title'] = f"{make_model_elem.get_text().strip()} {model_version_elem.get_text().strip()}"
elif make_model_elem:
car_data['title'] = make_model_elem.get_text().strip()
else:
car_data['title'] = title_elem.get_text().strip()
print(f" Car: {car_data['title']}")
else:
car_data['title'] = "Not found"
print(" Car: Not found")
# Extract price from the price section
price_elem = soup.find('span', class_='PriceInfo_price__XU0aF')
if price_elem:
car_data['price'] = price_elem.get_text().strip()
print(f" Price: {car_data['price']}")
else:
car_data['price'] = "Not found"
print(" Price: Not found")
# Extract mileage from the vehicle overview
mileage_elem = soup.find('div', class_='VehicleOverview_itemContainer__XSLWi')
if mileage_elem:
# Find the mileage item by looking for the mileage icon and text
mileage_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
for item in mileage_items:
title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
if title_elem and 'Mileage' in title_elem.get_text():
text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
if text_elem:
car_data['mileage'] = text_elem.get_text().strip()
print(f" Mileage: {car_data['mileage']}")
break
else:
car_data['mileage'] = "Not found"
print(" Mileage: Not found")
else:
car_data['mileage'] = "Not found"
print(" Mileage: Not found")
# Extract registration year from the vehicle overview
registration_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
for item in registration_items:
title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
if title_elem and 'First registration' in title_elem.get_text():
text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
if text_elem:
car_data['year'] = text_elem.get_text().strip()
print(f" Year: {car_data['year']}")
break
else:
car_data['year'] = "Not found"
print(" Year: Not found")
return car_data
5. Extracting Technical Specifications
This collects the technical specs from the overview and technical sections.
def extract_specifications(soup):
"""Extract technical specifications from the car detail page"""
specifications = {}
# Extract specifications from the vehicle overview section
overview_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
for item in overview_items:
title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
if title_elem and text_elem:
title = title_elem.get_text().strip()
value = text_elem.get_text().strip()
if 'Fuel type' in title:
specifications['fuel_type'] = value
print(f" Fuel Type: {value}")
elif 'Gearbox' in title:
specifications['transmission'] = value
print(f" Transmission: {value}")
elif 'Power' in title:
specifications['power'] = value
print(f" Power: {value}")
# Extract additional specifications from the technical data section
tech_section = soup.find('section', attrs={'data-cy': 'technical-details-section'})
if tech_section:
# Find all dt/dd pairs in the technical data
dt_elements = tech_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
dd_elements = tech_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
for dt, dd in zip(dt_elements, dd_elements):
title = dt.get_text().strip()
value = dd.get_text().strip()
if 'Engine size' in title:
specifications['engine_size'] = value
print(f" Engine Size: {value}")
elif 'Cylinders' in title:
specifications['cylinders'] = value
print(f" Cylinders: {value}")
elif 'Power' in title and 'power' not in specifications:
specifications['power'] = value
print(f" Power: {value}")
elif 'Gearbox' in title and 'transmission' not in specifications:
specifications['transmission'] = value
print(f" Transmission: {value}")
# Extract color information from the color section
color_section = soup.find('section', attrs={'data-cy': 'color-section'})
if color_section:
dt_elements = color_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
dd_elements = color_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
for dt, dd in zip(dt_elements, dd_elements):
title = dt.get_text().strip()
value = dd.get_text().strip()
if 'Manufacturer colour' in title:
specifications['color'] = value
print(f" Color: {value}")
elif 'Paint' in title:
specifications['paint_type'] = value
print(f" Paint Type: {value}")
return specifications
6. Extracting Features and Equipment
This gathers the features and equipment list.
def extract_features(soup):
"""Extract car features and equipment from the detail page"""
features = []
# Find equipment section
equipment_section = soup.find('section', attrs={'data-cy': 'equipment-section'})
if equipment_section:
# Find all dt/dd pairs in the equipment section
dt_elements = equipment_section.find_all('dt', class_='DataGrid_defaultDtStyle__soJ6R')
dd_elements = equipment_section.find_all('dd', class_='DataGrid_defaultDdStyle__3IYpG')
for dt, dd in zip(dt_elements, dd_elements):
category = dt.get_text().strip()
# Find all li elements in the dd
feature_items = dd.find_all('li')
if feature_items:
print(f" {category}:")
for item in feature_items:
feature_text = item.get_text().strip()
if feature_text:
features.append(f"{category}: {feature_text}")
print(f" β’ {feature_text}")
if not features:
print(" Features: Not found")
return features
7. Extracting Seller Information
This pulls seller details and location.
def extract_seller_info(soup):
"""Extract seller information from the car detail page"""
seller_data = {}
# Extract seller type from the vehicle overview
overview_items = soup.find_all('div', class_='VehicleOverview_itemContainer__XSLWi')
for item in overview_items:
title_elem = item.find('div', class_='VehicleOverview_itemTitle__S2_lb')
text_elem = item.find('div', class_='VehicleOverview_itemText__AI4dA')
if title_elem and text_elem and 'Seller' in title_elem.get_text():
seller_data['type'] = text_elem.get_text().strip()
print(f" Seller Type: {seller_data['type']}")
break
# Extract location from the location link
location_link = soup.find('a', class_='LocationWithPin_locationItem__tK1m5')
if location_link:
seller_data['location'] = location_link.get_text().strip()
print(f" Location: {seller_data['location']}")
else:
seller_data['location'] = "Not found"
print(" Location: Not found")
# Extract seller description from the seller notes section
seller_notes_section = soup.find('section', attrs={'data-cy': 'seller-notes-section'})
if seller_notes_section:
content_div = seller_notes_section.find('div', class_='SellerNotesSection_content__te2EB')
if content_div:
seller_data['description'] = content_div.get_text().strip()
print(f" Description: {seller_data['description'][:100]}...")
else:
seller_data['description'] = "Not found"
print(" Description: Not found")
else:
seller_data['description'] = "Not found"
print(" Description: Not found")
return seller_data
8. Main Scraping Function
This combines all the extraction steps for a single car page.
def scrape_car_details(url):
"""Main function to scrape detailed information from a single car page"""
print(f"\nScraping car details from: {url}")
# Make request
response = make_request(url)
if not response:
return None
# Parse HTML
soup = BeautifulSoup(response.content, 'html.parser')
# Extract all data
basic_info = extract_basic_info(soup)
specifications = extract_specifications(soup)
features = extract_features(soup)
seller_info = extract_seller_info(soup)
# Combine all data
result = {
'url': url,
**basic_info,
'specifications': specifications,
'features': features,
'seller': seller_info
}
return result
9. Main Execution
The main execution function manages the overall scraping workflow for individual car pages.
def main():
"""Main execution function"""
print("π Starting AutoScout24 Individual Car Scraper")
# Scrape car details
car_data = scrape_car_details(url)
if car_data:
print(f"\nβ
Successfully scraped car details!")
# Save results to file
with open('autoscout24_car_details.json', 'w') as f:
json.dump(car_data, f, indent=2)
print("πΎ Results saved to autoscout24_car_details.json")
return car_data
else:
print("β Failed to scrape car details")
return None
# Run the scraper
if __name__ == "__main__":
main()
Example Output
π Starting AutoScout24 Individual Car Scraper
Scraping car details from: https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d
β
Successfully accessed https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d
Car: Peugeot 207 Filou MOTORSCHADEN!!!!!
Price: β¬ 499
Mileage: 174,000 km
Year: 01/2008
Transmission: Manual
Fuel Type: Gasoline
Power: 70 kW (95 hp)
Engine Size: 1,397 cc
Cylinders: 4
Color: BLEU NEYSHA
Paint Type: Metallic
Comfort & Convenience:
β’ Power windows
Safety & Security:
β’ ABS
β’ Central door lock
β’ Driver-side airbag
β’ Passenger-side airbag
β’ Power steering
β’ Side airbag
Extras:
β’ Alloy wheels
Seller Type: Dealer
Location: Berlin
Description: Sonderausstattung:MOTOR DREHT NICHT!!!!!!Metallic-Lackierung, ALUFELGEN, u.s.w.Weitere Ausstattung:A...
β
Successfully scraped car details!
Handling Anti-Bot Protection
AutoScout24 has strong anti-bot checks (IP-based rules and JS content). Here are a few simple ways to reduce blocks.
1. User Agent Rotation
Rotate a few realistic user agents to avoid sending every request with the exact same fingerprint.
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.2227.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.3497.92 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
]
session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5"
})
2. Session Management
Use a session to keep cookies and reuse connections so your traffic looks more like a real browser session.
session = requests.Session()
session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
})
3. Rate Limiting
Add small random delays between requests to avoid hammering the server.
import time
for url in urls:
# Add random delay between requests
time.sleep(random.uniform(1, 3))
# ... scraping code ...
For more advanced anti-blocking techniques, see our guide on
5 Tools to Scrape Without Blocking and How it All Works
Tutorial on how to avoid web scraper blocking. What is javascript and TLS (JA3) fingerprinting and what role request headers play in blocking.
which covers TLS fingerprinting, IP rotation, and other detection methods.
Advanced Scraping Techniques
For bigger jobs, consider these additions.
1. Proxy Rotation
For large-scale scraping, use rotating proxies. This technique helps distribute requests across multiple IP addresses to avoid blocking.
proxies = {
'http': 'http://proxy1:port',
'https': 'https://proxy1:port'
}
response = session.get(url, proxies=proxies, timeout=15)
2. Data Storage and Analysis
Save scraped data to files so you can process and analyze it later.
import json
import csv
def save_data_json(data, filename):
"""Save data to JSON file"""
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
def save_data_csv(data, filename):
"""Save data to CSV file"""
if data and len(data) > 0:
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
# Collect data
scraped_data = []
for url in urls:
# ... scraping code ...
car_data = {
'title': title,
'price': price,
'mileage': mileage,
'year': year,
'location': location
}
scraped_data.append(car_data)
3. Error Handling and Retry Logic
Add simple retries with backoff to handle temporary errors.
import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def create_session_with_retries():
"""Create a session with retry logic"""
session = requests.Session()
# Configure retry strategy
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
For more advanced data processing and analysis techniques, see our guide on
How to Observe E-Commerce Trends using Web Scraping
In this example web scraping project we'll be taking a look at monitoring E-Commerce trends using Python, web scraping and data visualization tools.
Scraping with Scrapfly
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
- Anti-bot protection bypass - extract web pages without blocking!
- Rotating residential proxies - prevent IP address and geographic blocks.
- LLM prompts - extract data or ask questions using LLMs
- Extraction models - automatically find objects like products, articles, jobs, and more.
- Extraction templates - extract data using your own specification.
- Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.
If you don't want to manage proxies and blocks yourself, Scrapfly's API can handle the heavy lifting.
Here's how to use Scrapfly for AutoScout24:
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
scrapfly = ScrapflyClient(key="YOUR-SCRAPFLY-KEY")
# Scrape car listings
result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
tags=["autoscout24", "car-listings"],
format="json",
asp=True,
render_js=True,
url="https://www.autoscout24.com/lst/c/compact"
))
print(result)
# Scrape individual car details
car_result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
tags=["autoscout24", "car-details"],
format="json",
asp=True,
render_js=True,
url="https://www.autoscout24.com/offers/peugeot-207-filou-motorschaden-gasoline-0b93e496-1f1b-475d-a972-fa4bd490031d"
))
print(car_result)
Best Practices and Tips
A few practical tips:
- Respect robots.txt: Always check and follow the website's robots.txt file
- Implement delays: Use random delays between requests to avoid detection
- Handle errors gracefully: Implement proper error handling for network issues
- Monitor success rates: Track scraping success rates and adjust strategies accordingly
- Use proxies: Consider using rotating proxies for large-scale scraping
- Validate data: Always validate extracted data for completeness and accuracy
- Respect rate limits: Don't overwhelm the server with too many requests
- Update selectors: Regularly check and update CSS selectors as the site evolves
For more comprehensive web scraping best practices, see our
Everything to Know to Start Web Scraping in Python Today
Complete introduction to web scraping using Python: http, parsing, AI, scaling and deployment.
Related E-commerce Scraping Guides
If you're interested in scraping other automotive or e-commerce platforms, check out these related guides. These resources provide additional techniques and approaches for different types of websites.
- Comprehensive guide to scraping Amazon product data
How to Scrape Amazon.com Product Data and Reviews
This scrape guide covers the biggest e-commerce platform in US - Amazon.com. We'll take a look how to scrape product data and reviews in Python, as well as some common challenges, tips and tricks.
- Guide to extracting eBay listings and product information
How to Scrape Ebay Using Python (2025 Update)
In this scrape guide we'll be taking a look at Ebay.com - the biggest peer-to-peer e-commerce portal in the world. We'll be scraping product details and product search.
- Techniques for scraping Walmart product pages
How to Scrape Walmart.com Product Data (2025 Update)
Tutorial on how to scrape walmart.com product and review data using Python. How to avoid blocking to web scrape data at scale and other tips.
- Extracting product and review data from Etsy
How to Scrape Etsy.com Product, Shop and Search Data
In this scrapeguide we're taking a look at Etsy.com - a popular e-commerce market for hand crafted and vintage items. We'll be using Python and HTML parsing to scrape search and product data.
FAQ
A few common questions:
What are the main challenges when scraping AutoScout24?
AutoScout24 has strong bot protection and lots of JS-rendered content. Common issues are 403 errors, IP-based blocks, and changing selectors.
What data can I extract from individual AutoScout24 car pages?
Title, price, mileage, year, specs (fuel, transmission, power, color), features, and seller info (name, location, notes).
What can I do to avoid getting blocked?
Rotate user agents, use a session, add delays, and consider proxies. If you want an easier path, use Scrapfly's API.
Summary
We covered how the site is built, two working examples (listings and a single car), and simple anti-blocking steps. Start with requests + BeautifulSoup, add small delays and selector checks, and use proxies if needed. For hands-off scaling, try Scrapfly.