Web Scraping Without Blocking With Undetected ChromeDriver

article feature image

Web scraping blocking can happen for different reasons, requiring attention to various details. But what about about simple tools that can avoid web scraping blocking?

In this article, we'll explain the Undetected ChromeDriver and how to use it to avoid web scraping blocking. Let's dive in!

What is the Undetected ChromeDriver?

The Undetected ChromeDriver is a modified Web Driver for Selenium. It mimics regular browsers' behavior by various techniques, such as:

  • Changing Selenium's variable names to appear as normal web browsers.
  • Randomizing User-Agent strings.
  • Adding randomized delays between sending requests or executing actions.
  • Maintaining cookies and sessions correctly while browsing a website.
  • Simulating mouse clicks and moves, which makes browsing behavior appear natural.
  • Allowing for adding proxies, which prevents IP blocking and rate limiting.

The Undetected ChromeDriver uses the above techniques to avoid specific anti-scraping challenges, such as Cloudflare, Imperva and Datadome.

How to Scrape Without Getting Blocked? In-Depth Tutorial

Learn about web scraping blocking and how websites can recognize web scrapers as bots. You will also learn how to optimize your web scrapers to avoid web scraping blocking.

How to Scrape Without Getting Blocked? In-Depth Tutorial

Before proceeding to web scraping with Undetected ChromeDriver, let's install it.

Setup

In this Undetected ChromeDriver for web scraping guide, we'll use nowsecure.nl as our target website. Which uses Cloudflare to detect web scrapers. We'll use the Undetected ChromeDriver to bypass this protection. It can be installed using the pip terminal command:

pip install undetected-chromedriver

The above command will also install Selenium as it's used by the undetected-chromedriver under the hood.

How to Avoid Blocking using Undetected ChromeDriver?

Undetected ChromeDriver is a modified version of the web driver used by Selenium, which can avoid web scraping detection - let's take a look at it.

Comparing with Selenium

To confirm its capabilities, let's try our target website with standard Selenium code. For this, we'll use common selenium installation:

$ pip install selenium webdriver-manager

Then, let's scrape a secured web page that uses Cloudflare to detect web scrapers:

# pip install webdriver-manager
from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver 
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options 
import time

# Add selenium option
options = Options()
options.headless = False

# Configure Selenium options and download the default web driver automatically
driver = webdriver.Chrome(options=options, service=ChromeService(ChromeDriverManager().install()))
# Maximize the browser widnows size
driver.maximize_window()

# Go the target website
driver.get("https://nowsecure.nl/")
# Wait for security check
time.sleep(4)
# Take screenshot
driver.save_screenshot('screenshot.png')
# Close the driver
driver.close()

Here, we intialize a Selenium headless browser with the default web driver that's installed using the web-driver-manager. Then, we send a request to the target website and take a screenshot. Here is what we got:

Selenium headless browser got detected
scraper being detected and blocked by Cloudflare

We can see that the website detected us and requested a Cloudflare challenge to solve before proceeding to the website.

Bypass with Undetected Chromedriver

Now, that we know what failure looks like let's bypass this challenge using the Undetected ChromeDriver:

import undetected_chromedriver as uc
import time

# Add the driver options
options = uc.ChromeOptions() 
options.headless = False

# Configure the undetected_chromedriver options
driver = uc.Chrome(options=options) 

with driver:
    # Go to the target website
    driver.get("https://nowsecure.nl/")
# Wait for security check
time.sleep(4)

# Take a screenshot
driver.save_screenshot('screenshot.png')
# Close the browsers
driver.quit()

We initialize an undetected_chromedriver object, go to the target website and take a screenshot. Here is the screenshot we got:

screenshot of the undetected_chromedriver result
Driver passing Cloudflare detection

Shortcomings

We successfully avoided web scraping detection using the Undetected ChromeDriver for Cloudflare. However, it can still struggle against more advanced anti-bot systems.

For example, let's try bypass Opensea.io protection:

import undetected_chromedriver as uc
import time

# Add the driver options
options = uc.ChromeOptions() 
options.headless = False

# Configure that driver options
driver = uc.Chrome(options=options) 

with driver:
    # Go to the target website
    driver.get("https://opensea.io/")
# Wait for security check
time.sleep(4)

# Take a screenshot
driver.save_screenshot('opensea.png')
# Close the browsers
driver.quit()

In this example, OpenSea instantly detects and blocks our scraper:

screenshot of the undetected_chromedriver with opensea.io result
OpenSea.io blocking undetected chromedriver scraper

There are a few more things we can do to empower our use Undetected ChromeDriver for web scraping. Let's take a look at them next!

How to Add Proxies to Undetected ChromeDriver?

Proxies are essential for avoiding IP blocking while scraping by splitting the traffic between multiple IP addresses. Here is how you can add proxies to the Undetected ChromeDriver:

import undetected_chromedriver as uc

# Add the driver options
options = uc.ChromeOptions() 
options.headless = False
# For proxies without authentication
options.add_argument(f'--proxy-server=https://proxy_ip:port')
# For proxies with authentication
options.add_argument(f'--proxy-server=https://proxy_username:proxy_password@proxy_ip:proxy_port')
# Configure that driver options
driver = uc.Chrome(options=options) 

Although you can add proxies to the undetected chrome driver, there is no direct implementation for proxy rotation.

How to Rotate Proxies in Web Scraping

Discover how using proxy rotation can prevent blocking and learn about common rotation patterns, tips, and tricks.

How to Rotate Proxies in Web Scraping

We have seen that the Undetected ChromeDriver can't avoid advanced anti-scraping challenges and can't rotate proxies by itself. Let's look at a better solution!

Powering up with ScrapFly

ScrapFly is a web scraping API that bypasses all types of anti-scraping protections. It automatically handles rotating proxies, which prevents rate limiting and IP address blocking. It also supports running headless browsers on the cloud, allowing for scraping JavaScript loaded content without running them yourself.

scrapfly middleware
ScrapFly service does the heavy lifting for you

By using the ScrapFly asp feature, we can easily scrape websites without getting blocked:

from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="Your API key")

api_response: ScrapeApiResponse = scrapfly.scrape(
    scrape_config=ScrapeConfig(
        url="https://opensea.io",
        # Activate the JavaScript rendering feature to render images
        render_js=True,
        # Enable the asp to bypass anti scraping protections
        asp=True,
        # Take a screenshot
        screenshots={"opensea": "fullpage"},
    )
)

# Save the screenshot
scrapfly.save_screenshot(api_response=api_response, name="opensea")

# Use the built-in selector to parse the HTML
selector = api_response.selector

# Empty array to save the data into
data = []

# Loop through the trending NFT names
for nft_title in selector.xpath("//span[@data-id='TextBody']/div/div[1]/text()"):
    data.append(nft_title.get())

print(data)
# ['Akumu Dragonz', 'Nakamigos-CLOAKS', 'RTFKT x Nike Dunk Genesis CRYPTOKICKS', 'Skyborne - Genesis Immortals', 'Arbitrum Odyssey NFT', 'Parallel Alpha', 'Skyborne - Nexian Gems', 'Milady Maker', 'Otherside Vessels', 'BlockGames Dice']

Common Errors

Using Chromedriver with undetectable patches can introduce new errors. Here are some common errors and how to fix them.

ModuleNotFoundError: No module named 'undetected_chromedriver'

This error means something went wrong with unedetected_chromedriver installation. This usually happens when pip install installs the package to a different python environment. So, ensure that python and pip point to the same Python environment.

Alternatively, use package manager like poetry:

# create a project
$ mkdir my_project && cd my_project
$ poetry init --Dependency undetected_chromedriver
$ touch myscript.py   # create your python script
$ poetry run python myscript.py

This will ensure that undetected_chromedriver is installed in the same environment as your script.

This version of ChromeDriver only supports Chrome version X

Another common error is related to the browser version availability.

Message: unknown error: cannot connect to chrome at 127.0.0.1:33505
from session not created: This version of ChromeDriver only supports Chrome version 118
Current browser version is 117.0.5938.149

This error means available chrome browser on the current machine is too old for undetectable chrome driver to work. To fix this, simply update your Chrome browser to the newer version. Alternatively, if you have multiple chrome browser version available (such as beta) you can specify one using the browser_executable_path parameter:

import undetected_chromedriver as uc
driver = uc.Chrome(
    browser_executable_path="/usr/bin/google-chrome-beta"
)

FAQ

To wrap up this guide on Undetected ChromeDriver for web scraping, let's look at some frequently asked questions.

Are there alternatives to the Undetected ChromeDriver?

Yes, Puppeteer Stealth is a library that can bypass anti-scraping challenges similar to the Undetected ChromeDriver. However, it's based on Node.js and unavailable in Python yet.

How to avoid web scraping blocking?

There are multiple factors that lead to web scraping blocking including headers, IP addresses, security handshakes and javascript execution. To scrape without getting blocked, you must pay attention to these details. For more information, refer to our dedicated article on scraping without getting blocked.

Summary on Undetected Chromedriver use for Scraping

In this article, we explained how to use the Undetected ChromeDriverr to scrape without getting blocked. Which works by mimicking normal web browsers configurations to appear natural.

We have seen that the Undetected ChromeDriver can avoid specific anti-scraping challenges like Cloudflare. However, it can be detected by advanced anti-bot systems.

Related Posts

How to Know What Anti-Bot Service a Website is Using?

In this article we'll take a look at two popular tools: WhatWaf and Wafw00f which can identify what WAF service is used.

Use Curl Impersonate to scrape as Chrome or Firefox

Learn how to prevent TLS fingerprinting by impersonating normal web browser configurations. We'll start by explaining what the Curl Impersonate is, how it works, how to install and use it. Finally, we'll explore using it with Python to avoid web scraping blocking.

FlareSolverr Guide: Bypass Cloudflare While Scraping

In this article, we'll explore the FlareSolverr tool and how to use it to get around Cloudflare while scraping. We'll start by explaining what FlareSolverr is, how it works, how to install and use it. Let's get started!