Cloudflare is mostly known for its CDN service, but in the web scraping context, it's the Cloudflare bot protection that hinders the data extraction process. To bypass Cloudflare when web scraping, we have to start by reverse engineering its challenges and how it detects HTTP requests.
In this guide, we'll start by defining what Cloudflare challenge is and how to identify its presence on web pages by exploring its common error tracebacks. Then, we'll explain how to bypass Cloudflare by exploring its fingerprinting methods and the best way to avoid each. Let's dive in!
Key Takeaways
Learn to bypass Cloudflare anti-scraping protection by understanding its 2025 fingerprinting methods, implementing proper browser headers, and using stealth tools with residential proxies to avoid detection.
- Identify Cloudflare challenges including Turnstile CAPTCHA, error codes (403, 429, 502, 1003, 1009, 1010, 1015, 1020), and specific error messages
- Bypass browser fingerprinting by mimicking real browser headers, TLS signatures, and HTTP2 protocols
- Use residential or IPv6 proxies with proper IP rotation to avoid geographical blocking and rate limiting
- Handle Cloudflare Turnstile challenges using CAPTCHA solver services or prevention through good stealth
- Implement stealth tools: Nodriver (2025 recommended), SeleniumBase UC Mode, or Camoufox for Python projects
- Avoid deprecated tools like puppeteer-stealth (discontinued February 2025) and migrate to actively maintained alternatives
- Counter per-customer ML models with varied behavioral patterns, random timing, and natural navigation flows
- Use tools like ScrapFly for automated Cloudflare and Turnstile bypassing at scale
What Is Cloudflare Bot Management?
Cloudflare Bot Management is a web service that tries to detect and block web scrapers and other bots from accessing the website.
It's a complex multi-tier service that is usually used in legitimate bot and spam prevention but it's becoming an increasingly popular way to block web scrapers from accessing public data.
To start let's take a look at some common Cloudflare errors that scrapers encounter and what do they mean.
Popular Cloudflare Errors
Most Cloudflare bot detection errors result in HTTP status codes 401, 403, 429, or 502, with the 403 error being the most commonly encountered.
Every HTTP status code represents a unique blocking incident. Hence, knowing how to get past Cloudflare relies on identifying and understanding the error encountered.
Cloudflare "please unblock challenges.cloudflare.com to proceed" Error
The "please unblock challenges.cloudflare.com to proceed" has been commonly encountered across different web pages recently. It blocks the target resource loading with the following message:
The above error message prevents the web page from correctly loading by blocking the Cloudflare JS challenge host "challenges.cloudflare.com".
There are different causes for this error, one of them being an internal Cloudflare incident or outage, which usually gets resolved shortly. Other factors contributing to this error may be present locally and require manual debugging, such as firewalls, browser extensions, VPNs, or other security tools.
Cloudflare 1020 Error
The Cloudflare 1020: access denied error is commonly encountered on various web pages, with the popular "Access Denied" message. It doesn't indicate the exact blocking cause, as it's affected by various reasons, as we'll explain. The Cloudflare 1020 bypass can be approached using complete scraper obfuscation by mimicking a real user behavior, which we'll explore later.
Cloudflare 1009 Error
The Cloudflare error 1009 comes with a popular error message, "... has banned the country or region of your IP address". As described in the message, this error represents a geographical-based blocking when attempting to access a domain that's restricted to a specific country or region. Bypassing the Cloudflare 1009 error requires using a proxy server to change the IP address to one in the allowed region.
Cloudflare 1015 Error
The Cloudflare error 1015: you are being rate limited represents an IP address blocking, which occurs when the HTTP requests' rate exceeds a specified threshold within a specific time frame. Splitting the requests' traffic across multiple IP addresses using proxies is crucial to prevent the IP address from getting blocked by Cloudflare protection.
Cloudflare 1010 Error
The Cloudflare error 1010: access denied occurs when the browser fingerprint is detected to be automated using automation libraries. To avoid Cloudflare bot detection of the 1010 error, obfuscate the headless browser against JavaScript fingerprinting
Cloudflare 1003 Error
The Cloudflare error 1003 comes with the message "Direct IP access not allowed." This error occurs when attempting to access a Cloudflare-protected website directly via its IP address instead of using the domain name. Cloudflare requires requests to use the proper domain hostname to function correctly. To bypass this error, always use the full domain URL instead of the IP address when making requests.
Additionally, here's a list of error traces indicating Cloudflare blocks content on its web page:
- Response headers might have a
cf-rayfield value. - Server header fields have the value of
cloudflare. - The
Set-Cookieresponse headers include the__cfuidcookie field. - "Attention Required!" or "Cloudflare Ray ID:" in HTML.
- "DDoS protection by Cloudflare" in HTML.
- Encountering
CLOUDFLARE_ERROR_500S_BOXwhen requesting invalid URLs.
Some of the above Cloudflare anti-bot protection measures require solving CAPTCHA challenges. However, the best way to bypass Cloudflare CAPTCHA is to prevent it from occurring in the first place!
Cloudflare Turnstile: The Modern CAPTCHA Challenge
Cloudflare Turnstile is Cloudflare's CAPTCHA replacement introduced in 2022 and now widely used in 2024-2025. Unlike traditional image-based CAPTCHAs like reCAPTCHA, Turnstile works differently.
How Turnstile Differs from Traditional CAPTCHAs
Turnstile operates through three distinct modes:
Non-interactive (Invisible)
Runs completely in the background using browser fingerprinting and behavioral analysis. Users never see a challenge box - the verification happens silently through JavaScript execution and cryptographic proof-of-work.Invisible (Brief Check)
Shows a brief "Verifying you are human" message for 1-2 seconds while running background checks. No user interaction required unless the trust score is low.Interactive
Requires user action like clicking a checkbox, similar to reCAPTCHA v2. This mode is triggered when the initial trust score is insufficient or when suspicious patterns are detected.
Turnstile Detection Methods
Turnstile uses advanced techniques to verify visitors:
- JavaScript-based cryptographic challenges that require browser execution
- Browser fingerprinting analyzing canvas, WebGL, audio context, and hardware details
- Behavioral biometrics including mouse movements, timing patterns, and interaction sequences
- Network analysis examining IP reputation, TLS fingerprints, and connection characteristics
- Token verification with time-limited, single-use challenge tokens
Bypassing Turnstile Challenges
Turnstile is significantly harder to bypass than legacy CAPTCHAs. Here are the main approaches:
CAPTCHA Solver Services - Services like 2Captcha and CapSolver use human solvers or automated models to solve Turnstile challenges. Cost is about $1.45 per 1,000 solves. These services receive the challenge parameters, solve it, and return a token for your scraper to use.
Good Fingerprinting - Avoid triggering Turnstile by maintaining proper browser fingerprints, using residential proxies, and showing human-like behavior patterns. Prevention works better than solving.
Token Management - Some scrapers extract and manage Turnstile tokens, though this requires understanding the challenge flow and breaks when Cloudflare updates their system.
Headless Browser Stealth - Tools like Nodriver and SeleniumBase UC Mode can sometimes pass Turnstile's non-interactive mode by emulating real browsers, though success rates vary.
Turnstile is designed to be expensive and difficult to bypass at scale. For most web scraping projects, use a managed service like ScrapFly that handles Turnstile challenges automatically, or make your scraper stealthy enough to avoid triggering interactive challenges.
How Does Cloudflare Detect Web Scrapers?
To detect web scrapers, Cloudflare uses different technologies to determine whether traffic is coming from a real user or an automated script for data extraction.
Anti-bot systems like Cloudflare combines the results of many different analyses and fingerprinting methods into an overall trust score. This score determines whether the HTTP requests are allowed to visit to the target web pages.
Based on the final trust score, the request has three possible fates:
- Proceed to the resource origin behind the firewall.
- Solve a CAPTCHA or computational JavaScript challenge.
- Get blocked entirely.
In addition to the above analyses, Cloudflare continuously tracks the HTTP requests' behavior and compares them with real users using machine learning and statistical models. This means the request may bypass Cloudflare a few times before getting blocked, as the trust score is likely to change.
The above complex operations make data extraction challenging. However, exploring each component individually, we'll find that Cloudflare bypass for data extraction is very much possible!
TLS Fingerprinting
The TLS handshake is the initial procedure when a request is sent to a web server with an SSL certificate over the HTTPS protocol. During this process, the client and server negotiate the encrypted data to create a fingerprint called JA3.
Since HTTP clients differ in their capabilities and configuration, they create a unique JA3 fingerprint, which anti-bot solutions use to distinguish automated clients like web scrapers from real users using web browsers.
It's crucial to use HTTP clients performing TLS handshake similar to normal browsers and avoid those with easy-to-distinguish TLS patterns, as they can be instantly detected. For this, refer to ScrapFly's JA3 tool to calculate and adjust your TLS fingerprint.
For further details on TLS fingerprinting, refer to our dedicated guide.
How TLS Fingerprint is Used to Block Web Scrapers?
TLS fingeprinting is a popular way to identify web scrapers that not many developers are aware of. What is it and how can we fortify our scrapers to avoid being detected?
IP Address Fingerprinting
There are different factors affecting the IP address analysis process. This process starts with the IP address type, which can be either of the following:
Residential
Represents IP addresses assigned by ISPs for consumers browsing from home networks. Residential IP addresses have a positive trust score, as they are mostly associated with real users and expensive to acquire.Mobile
Mobile IP addresses are assigned by cellular network towers. They have a positive trust score since they are associated with human traffic. Moreover, mobile IP addresses are automatically rotated to new ones during specified intervals, making them harder for anti-bot services to detect.Datacenter
Represents IP addresses assigned by data centers, such as AWS, Google Cloud, and Azure. Data center IPs have a significant negative trust score, as they are mostly associated with automated scripts.
With IP address fingerprinting, Cloudflare can estimate the likelihood of the connecting client being a genuine real user. For example, human users rarely browse the internet through data center proxies. Hence, web scrapers using such IPs are very likely to get blocked.
Another aspect of IP address analysis is the request rate. Anti-bot systems can detect IP addresses that exceed the defined threshold of requests and block them.
Therefore, rotate residential or mobile proxies to prevent IP address fingerprinting from trusted proxy providers. For further details on IP addresses and their trust score calculation process, refer to our dedicated guide.
How to Avoid Web Scraper IP Blocking?
How IP addresses are used in web scraping blocking. Understanding IP metadata and fingerprinting techniques to avoid web scraper blocks.
HTTP Details
Most users browse the internet web pages through a few popular browsers, such as Chrome, Firefox, or Edge. These browsers intercept their configuration. Hence, the HTTP requests' details become repeated, making it easy for anti-bot solutions to spot any outliers.
Headers
Request headers are an essential part of any HTTP request details. Anti-bot systems use them to distinguish web scraping requests from those of normal browsers. Hence, it's necessary to reverse engineer and replicate browser headers to avoid being blocked by Cloudflare protection. Here are common request headers to observe with HTTP requests.
Accept
Represents the response data type accepted by the HTTP client on the given request. It should match a common headless browser when scraping HTML pages. In other cases, it should match the resource's data type, such as application/json when scraping hidden APIs or text/xml for sitemaps:
# Chrome/Safari
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
# Firefox
text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language
Indicates the supported browser language. Setting this header not only helps mimic real browser configuration but also helps set the web scraper localization settings:
# Firefox
en-US,en;q=0.5
# Chrome
en-US,en;q=0.9
User-Agent
The most popular header for web scrapers. It represents the client's rendering capabilities, including the device type, operating system, browser name, and version. This header is prone to identification, and it's important to rotate the User-Agent header for further scraper obfuscation:
# Chrome
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
# Firefox
Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0
Cookie
The cookie header represents the cookie values sent by the request to the web server. While cookies in web scraping don't play a critical role in HTTP fingerprinting, they ensure the same website behavior of browser navigation when scraping. It can also contain specific values to authorize the requests:
Cookie: key=value; key2=value2; key3=key3
HTTP2
Another aspect of the HTTP details to observe is the protocol used. Most current websites and browsers operate over the HTTP2 protocol, while many HTTP clients are still tied to HTTP1.1, marking their sent requests as suspicious.
That being said, HTTP2 is provided in many HTTP clients, such as httpx and cURL, but it's not enabled by default. Use the HTTP2 fingerprint testing tool to ensure the data extraction requests use HTTP2.
So, enable HTTP2 and make sure that the headers used match a common web browser to bypass Cloudflare while web scraping. For further details on HTTP headers and their role, refer to our dedicated guide.
How Headers Are Used to Block Web Scrapers and How to Fix It
Introduction to web scraping headers - what do they mean, how to configure them in web scrapers and how to avoid being blocked.
JavaScript Fingerprinting
JavaScript provides comprehensive details about the connecting clients, are used by Cloudflare fingerprinting mechanisms. Since JavaScript allows arbitrary code to be executed on the client side, it can be used to extract different details about the client, such as:
- Javascript runtime details.
- Hardware details and capabilities.
- Operating system details.
- Web browser details.
It seems like anti-bot services already know a lot about their clients!
Fortunately, JavaScript execution is time-consuming and prone to false positives. This means that a Cloudflare bot protection doesn't heavily count on JavaScript fingerprinting.
Theoretically, it's possible to reverse engineer the computational JavaScript challenges and solve them using scripts. However, such a solution requires many debugging hours, even for experienced developers. Moreover, any modifications to the challenge algorithms will make the solving script outdated.
On the other hand, a much more accessible and common solution is to use a real web browser for web scraping. This can be approached using browser automation libraries, such as Selenium, Puppeteer, or Playwright.
So, introduce browser automation for the scraping pipeline to increase the trust score for a higher chance of Cloudflare bypass.
More advanced scraping tools can combine the capabilities of HTTP clients and web browsers to bypass Cloudflare. First, the browser requests the target web page to retrieve its session values and establish a trust score. Then, session values are reused with regular HTTP clients, such as httpx in Python and ScrapFly sessions feature.
For a more in-depth look, see our guide on JavaScript fingerprint role in terms of web scraping blocking.
How Javascript is Used to Block Web Scrapers? In-Depth Guide
Introduction to how javascript is used to detect web scrapers. What's in javascript fingerprint and how to correctly spoof it for web scraping.
Behavior Analysis & Machine Learning (2025 Update)
With all the different Cloudflare anti-bot detection techniques, the trust score is not a constant number and will be constantly adjusted with ongoing connection.
For example, a client can start the connection with a Cloudflare protected website with a trust score of 80. Then, the client requests 100 pages in just a few seconds. This will decrease the trust score, as it's not likely for normal users to request at such a high rate.
On the other hand, bots with human-like behavior can get a high trust score that can remain steady or even increase.
Per-Customer Machine Learning Models (2025)
As of 2024-2025, Cloudflare introduced per-customer defense systems for Enterprise Bot Management customers. These systems use machine learning models that automatically tune detection based on each website's specific traffic patterns:
- Custom Trust Scores - Each domain gets its own baseline of "normal" traffic, making generic bypass techniques less effective
- Adaptive Thresholds - The system learns what constitutes suspicious behavior for each specific website
- Residential Proxy Detection - Traffic pattern analysis can detect residential proxies used by scrapers, even if they have good IP reputation
- Session Consistency Checks - Cloudflare tracks behavioral consistency across requests within the same session
- Pattern Recognition - Machine learning models continuously update to detect new evasion techniques
Traffic Pattern Analysis
Modern Cloudflare protection analyzes:
- Request sequencing - Do requests follow logical navigation patterns?
- Timing consistency - Are delays between requests perfectly regular (bot-like) or naturally variable?
- Resource loading patterns - Does the client load CSS, images, and fonts like a real browser?
- Mouse and keyboard interaction - Are there natural pauses, movements, and interaction patterns?
- Session behavior - Does the session exhibit exploration behavior or targeted data extraction?
Evading Enhanced Behavioral Detection
To bypass 2025 behavioral analysis, it's critical to distribute web scraper traffic through multiple agents and techniques:
- Add random timeouts between requests (2-5 seconds with ±500ms randomness)
- Rotate User-Agent headers with matching TLS/HTTP fingerprints
- Randomize viewport and browser settings across sessions
- Mimic mouse moves and keyboard clicks when using headless browsers
- Follow natural navigation flows (homepage → category → search → product)
- Maintain session consistency - don't change fingerprints mid-session
- Load page resources naturally - fetch CSS, JS, images like a real browser
- Vary request timing - avoid perfectly regular intervals
How to Bypass Cloudflare Bot Protection?
Now that we are familiar with the different fingerprinting factors that lead Cloudflare to detect HTTP requests. We can conclude that __bypassing Cloudflare is the result of getting a high trust score.
Let's explore practical approaches for fortifying web scrappers against the different Cloudflare protection mechanisms!
Start With Headless Browsers
Since Cloudflare uses JavaScript challnges and fingerprinting mechamisns to detect web scrapers, using headless browsers is a often necessary.
Such an approach is available through browser automation libraries like Selenium, Playwright, and Puppeteer, which allow running real web browsers without GUI elements, known as headless browsers.
These headless browsers can automatically solve JavaScript fingerprinting challenges to bypass antibot systems instead of reverse engineering them.
How to Scrape Dynamic Websites Using Headless Web Browsers
Introduction to using web automation tools such as Puppeteer, Playwright, Selenium and ScrapFly to render dynamic websites for web scraping
Use High Quality Residential Proxies
As Cloudflare uses IP address analysis methods to calculate a trust score, using resdiential proxies help bypassing Cloudflare's IP address fingerprinting.
Moreover, web scraping at scale requires rotating proxies. This is to prevent IP address blocking when the requests' rate exceeds the define limits by spliting the load across mulltiple IP addresses.
The Complete Guide To Using Proxies For Web Scraping
Introduction to proxy usage in web scraping. What types of proxies are there? How to evaluate proxy providers and avoid common issues.
Try Nodriver (2025 Recommended)
Nodriver is the successor to undetected-chromedriver, created by the same developer. It's designed for stealth browser automation in 2025 to bypass Cloudflare protection.Why Nodriver?
Unlike traditional Selenium-based solutions, Nodriver uses a different approach:
- No WebDriver patches needed - Built from scratch to avoid automation detection
- Direct Chrome DevTools Protocol - Communicates directly with Chrome without leaving traces
- Native stealth capabilities - Designed to be undetectable
- Active development - Continuously updated to counter new detection methods
- Better performance - Lower overhead than patched Selenium drivers
Nodriver Example for Cloudflare Bypass
import asyncio
import nodriver as uc
async def scrape_cloudflare_protected_site():
# Launch undetected Chrome browser
browser = await uc.start()
# Navigate to Cloudflare-protected page
page = await browser.get('https://example.com/cloudflare-protected')
# Wait for Cloudflare challenge to complete automatically
await page.sleep(5)
# Extract content after bypass
content = await page.get_content()
print(content)
# Interact with the page like a real user
element = await page.find('button.search')
await element.click()
await browser.stop()
# Run the scraper
asyncio.run(scrape_cloudflare_protected_site())
Nodriver's async architecture and native stealth make it a good choice for new web scraping projects in 2025, especially when dealing with Cloudflare and Turnstile challenges.
Try SeleniumBase UC Mode (2025 Production-Ready)
SeleniumBase UC Mode builds on undetected-chromedriver with added features and active maintenance. It's a good option for teams that need reliability and support.SeleniumBase Advantages
- Tested in production - Used by many companies
- Built-in CAPTCHA helpers - Methods for handling Turnstile and other challenges
- Good documentation - Extensive examples and tutorials
- Regular updates - Rapid response to Cloudflare changes
- Easy migration - Drop-in replacement for standard Selenium
- Additional features - Automatic retries, session management, proxy rotation
SeleniumBase Example
from seleniumbase import SB
with SB(uc=True) as sb: # uc=True enables undetected mode
# Open Cloudflare-protected page
sb.uc_open_with_reconnect("https://example.com/cloudflare-protected", 5)
# SeleniumBase automatically handles Cloudflare challenges
sb.sleep(3)
# Extract data after bypass
title = sb.get_title()
content = sb.get_page_source()
# Navigate like a real user
sb.uc_click("button#search")
sb.type("input[name='query']", "web scraping")
# Take screenshot for debugging
sb.save_screenshot("after_cloudflare.png")
SeleniumBase UC Mode is good for production scraping operations where reliability and maintainability are important.
Try Camoufox (Firefox-Based Alternative)
Camoufox is an open-source anti-detect browser based on Firefox designed for web scraping. Unlike Chrome-based solutions, it offers a different fingerprint profile that can be helpful.Camoufox Benefits
- Firefox-based - Different fingerprint than Chrome-based tools
- Python-native - Built specifically for Python automation
- Anti-fingerprinting - Patches Firefox to prevent detection
- Lightweight - Lower resource usage than Chrome
- Real browser profile - Uses actual Firefox user profiles
Camoufox Example
from camoufox.sync_api import Camoufox
# Launch Camoufox with stealth settings
with Camoufox(
headless=False, # Use False for better success with Cloudflare
humanize=True # Enable human-like behavior
) as browser:
# Create a new page
page = browser.new_page()
# Navigate to Cloudflare-protected site
page.goto('https://example.com/cloudflare-protected')
# Wait for Cloudflare challenge
page.wait_for_timeout(5000)
# Extract content
content = page.content()
# Interact naturally
page.click('text=Search')
page.fill('input[type="search"]', 'products')
Camoufox is particularly useful when Chrome-based solutions are being detected or when you want to diversify your browser fingerprints across multiple scrapers.
Try undetected-chromedriver (Legacy - Use Nodriver Instead)
Note: While undetected-chromedriver is still functional, it's recommended to migrate to Nodriver for new projects. Nodriver is the official successor from the same creator with better architecture and detection evasion.
There are a few key differences between headless browsers and regular ones. Antibot solutions, like Cloudflare, rely on these differences to detect headless browser usage. For example, the navigator.webdriver value is set to true with automated browsers only:
The undetected-chromedriver is a community-patched web driver that allows Selenium bypass Cloudflare. These patches inlclude headless browser fixes for TLS, HTTP and JavaScript fingerprints.
Web Scraping Without Blocking With Undetected ChromeDriver
In this tutorial we'll be taking a look at a new popular web scraping tool Undetected ChromeDriver which is a Selenium extension that allows to bypass many scraper blocking techniques.
Try Puppeteer Stealth Plugin (DEPRECATED - Use Playwright Stealth Instead)
IMPORTANT DEPRECATION NOTICE: As of February 2025, puppeteer-extra-stealth is no longer actively maintained. The original maintainer announced the project will not receive further updates. While existing code may still work for basic cases, migrate to actively maintained alternatives like Playwright with stealth plugins or the Python-based solutions above.
Puppeteer is a popular NodeJS library for headless browser automation, with the common fingerprinting leaks found in other automation libraries.
Historically, puppeteer-stealth was a plugin that patched Puppeteer to prevent anti bot detection through multiple evasion techniques, including:
- Modifying the
navigator.pluginsproperties to match a common browser plugin. - Mimicking common permissions as if they were enabled by a real user.
- Removing the common
navigator.webdrivervalue. - Preventing canvas and WebGL fingerprinting.
- Rotating the User-Agent to match real browser ones.
Migration Recommendations (2025)
For NodeJS/JavaScript projects:
- Playwright with Stealth - Actively maintained with similar capabilities to puppeteer-stealth
- Playwright Extra Stealth - Community-maintained stealth plugins for Playwright
For Python projects (recommended):
- Nodriver - Best-in-class stealth capabilities
- SeleniumBase UC Mode - Production-ready with CAPTCHA handling
- Camoufox - Firefox-based alternative
The stealth plugin capabilities are implemented in other browser automation libraries, such as selenium-stealth and playwright-stealth, to make Playwright Cloudflare resistant.
Try FlareSolverr
FlareSolverr is a popular community tool for Bypassing Cloudflare by combining the power of both headless browsers and HTTP clients. It provides compressive methods for managing bypass sessions.
FlareSovlerr's workflow can be explained through the following steps:
- The target web page is requested with an undetected-chromedriver instance.
- The Cloudflare challenge on the web page is bypassed, and the successful request session values get saved.
- The saved session values of the successful request, including its headers and cookies, are re-used with regular HTTP requests.
Using FlareSovlerr to bypass Cloudflare makes scaling web scrapers resource-effective. This is due to the smart session usage, which decreases the requirement to run a headless browser with each request.
FlareSolverr Guide: Bypass Cloudflare While Scraping
In this article, we'll explore the FlareSolverr tool and how to use it to get around Cloudflare while scraping. We'll start by explaining what FlareSolverr is, how it works, how to install and use it. Let's get started!
Try curl-impersonate
curl-impersonate is a community tool that fortifies the libcurl HTTP client library to mimic the behavior of a real web browser. It patches the TLS, HTTP, and Javascript fingerprints to make the HTTP requests look like they're coming from a real web browser.While curl-impersonate or similar clients like cURL or Postman limits the web scraping process due to the lack of parsing capablities. It can be used to modify the TLS details of other HTTP clients in different programing languages. One such example is curl_cffi, an interface for curl-impersoate in Python.
Use Curl Impersonate to scrape as Chrome or Firefox
Learn how to prevent TLS fingerprinting by impersonating normal web browser configurations. We'll start by explaining what the Curl Impersonate is, how it works, how to install and use it. Finally, we'll explore using it with Python to avoid web scraping blocking.
Try Warming Up Scrapers
To bypass behavior analysis, adjusting scraper behavior to appear more natural can drastically increase Cloudflare trust scores. In reality, most real users don't visit product URLs directly. They often explore websites in steps like:
- Start with the homepage.
- Browser product categories.
- Search for a product.
- View the product page.
Prefixing scraping logic with this warmup behavior can make the scraper appear more human-like and increase behavior analysis detection.
Rotate Real User Fingerprints
For sustained web scraping and Cloudflare bypass in 2024, headless browsers should should constantly be blended with different, realistic fingerprint profiles: screen resolution, operating system, and browser type all play an essential role in Bypassing Cloudflare.
Each headless browser library can be configured to use different resolution and rendering capabilities. Distributing scraping through multiple realistic browser configurations can prevent Cloudlfare from detecting the scraper.
For further details, see ScrapFly's browser fingerprint tool to observe how your browser looks to Cloudflare. It collects different details about the browser, which helps make web scrapers look like regular browsers.
Leverage IPv6 Proxies (2025 Technique)
IPv6 addresses are gaining use as a bypass technique in 2025. IPv6 addresses can sometimes bypass Cloudflare detection more easily than traditional IPv4 addresses.
Why IPv6 Works for Bypassing Cloudflare
- Less tracked by IP reputation systems - Many anti-bot systems focus on IPv4 reputation databases, while IPv6 tracking is less complete
- Large address space - IPv6's address space (340 undecillion addresses) makes building reputation databases and blocking individual IPs difficult
- Lower scrutiny - Some Cloudflare configurations don't monitor IPv6 traffic as strictly as IPv4
- Newer infrastructure - IPv6 reputation scoring is less developed than IPv4 systems
Considerations When Using IPv6
While IPv6 can be advantageous, there are important limitations:
- Not universally supported - Not all proxy providers offer IPv6 addresses
- Target compatibility - The target website must support IPv6 connectivity (check with
ping -6 example.com) - Still requires other evasion - IPv6 alone isn't enough; combine with proper fingerprinting, headers, and behavioral patterns
- Quality matters - Datacenter IPv6 addresses still have lower trust scores than residential ones
IPv6 Proxy Configuration Example
# Using IPv6 proxies with httpx
import httpx
proxies = {
"http://": "http://[2001:db8::1]:8080", # IPv6 proxy format
"https://": "http://[2001:db8::1]:8080"
}
# Configure client for IPv6
client = httpx.Client(
proxies=proxies,
headers={"User-Agent": "Mozilla/5.0..."},
http2=True
)
response = client.get("https://example.com/cloudflare-protected")
IPv6 proxies work best when combined with residential proxy pools and proper browser fingerprinting. They're particularly effective for distributing load across a larger IP space to avoid rate limiting.
Bypass Cloudflare with ScrapFly
Bypassing Cloudflare protection while possible is very difficult. Let Scrapfly do it for you.
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale. Each product is equipped with an automatic bypass for any anti-bot system and we achieve this by:
- Maintaining a fleet of real, reinforced web browsers with real fingerprint profiles.
- Millions of self-healing proxies of the highest possible trust score.
- Constantly evolving and adapting to new anti-bot systems.
- We've been doing this publicly since 2020 with the best bypass on the market!
It takes Scrapfly several full-time engineers to maintain this system, so you don't have to!
Here's how to scrape Cloudflare-protected pages using ScrapFly.
All we have to do is enable the asp parameter and select a proxy pool and country:
# standard web scraping code
import httpx
from parsel import Selector
response = httpx.get("some web page with cloudflare challenge URL")
selector = Selector(response.text)
# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient
# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")
response = scrapfly.scrape(ScrapeConfig(
url="web page URL",
asp=True, # enable the anti scraping protection to bypass blocking
country="US", # set the proxy location to a specfic country
proxy_pool="public_residential_pool", # select a proxy pool
render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))
# use the built in Parsel selector
selector = response.selector
# access the HTML content
html = response.scrape_result['content']
Scrapfly is easily accessible using Python and Typescript SDKs.
FAQs
What is Cloudflare Turnstile and how is it different from traditional CAPTCHAs?
Cloudflare Turnstile is a CAPTCHA replacement introduced in 2022 that runs in the background using browser fingerprinting and cryptographic challenges. Unlike traditional image-based CAPTCHAs, Turnstile has three modes: non-interactive (completely invisible), invisible (brief verification), and interactive (user action required). It's harder to bypass than older CAPTCHAs because it uses JavaScript challenges, behavioral biometrics, and real-time token verification. The best approach is using CAPTCHA solver services like 2Captcha or CapSolver, or preventing Turnstile from triggering through good stealth techniques.
Which tool should I use for Cloudflare bypass in 2025?
For Python projects:
- Nodriver - Good choice for new projects, stealth capabilities
- SeleniumBase UC Mode - Good for production environments, reliable
- Camoufox - Good for diversifying fingerprints with Firefox-based automation
For JavaScript/NodeJS projects:
- Playwright with Stealth plugins - Most actively maintained
- Avoid puppeteer-stealth - Deprecated as of February 2025
For all projects at scale:
- ScrapFly or similar services - Good for production scraping, handles all bypass logic automatically
Choose based on your needs: Nodriver for better evasion, SeleniumBase for stability, or managed services for simpler solutions.
Why did my Puppeteer-stealth script stop working in 2025?
As of February 2025, the puppeteer-extra-stealth library is no longer actively maintained. The original maintainer announced that the project will not receive further updates. Cloudflare and other anti-bot systems have likely updated their detection methods to identify puppeteer-stealth users. You should migrate to actively maintained alternatives like:
- Nodriver (Python) - Stealth browser from the undetected-chromedriver creator
- SeleniumBase UC Mode (Python) - Includes built-in CAPTCHA handling
- Playwright with Stealth (JavaScript/Python) - Actively maintained stealth plugins
- Camoufox (Python) - Firefox-based anti-detect browser
Summary
In this article, we've looked at how to get around Cloudflare anti-bot systems when web scraping in 2025. We have seen that Cloudflare identifies automated traffic through fingerprinting techniques including IP reputation analysis, TLS/HTTP2 fingerprinting, JavaScript challenges, Turnstile CAPTCHAs, and per-customer machine learning models.
We have explored Cloudflare bypass strategies, including:
- Using residential or IPv6 proxies to avoid IP-based blocking and distribute load effectively
- Implementing stealth tools like Nodriver, SeleniumBase UC Mode, and Camoufox that are actively maintained for 2025
- Avoiding deprecated tools like puppeteer-stealth (discontinued February 2025) and migrating to supported alternatives
- Handling Cloudflare Turnstile challenges through CAPTCHA solver services or good stealth techniques
- Using web scraping libraries resistant to JA3 fingerprinting with proper TLS and HTTP2 configurations
- Mimicking natural user behavior with varied timing, realistic navigation flows, and consistent session fingerprints
- Using managed services like ScrapFly that automatically handle all bypass logic, including Turnstile challenges
The landscape of Cloudflare bypass continues to evolve with per-customer ML models and behavioral analysis in 2025, making it important to stay updated with the latest tools and techniques.
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens who are protected by GDPR.
- Do not repurpose the entire public datasets which can be illegal in some countries.