Akamai Bot Manager is a popular web service that protects websites from bots and scrapers. It's used by many popular websites including Amazon, Ebay, Airbnb and many others.
Akamai is primarily known for using AI in their bot detection software but it's powered by traditional bot detection methods like fingerprinting and connection analysis. This means with careful engineering Akamai can be bypassed when web scraping.
In this article, we'll be taking a look at how to bypass Akamai Bot Manager and how to detect when a request has been blocked by Akamai. We'll also cover common Akamai errors and signs that indicate that requests have been blocked. Let's dive in!
What is Akamai Bot Manager?
Akamai offers a suite of web services and the Bot Manager service is used to determine whether connecting user is a human or an automated process. While it has a legitimate use of protecting websites from malicious bots it also blocks web scrapers from accessing public data.
Akamai Bot Manager is primarily used by big websites like Ebay.com, Airbnb.com, Amazon.com making web scraping of these targets difficult but possible. Next, let's take a look at some popular Akamai errors and how the whole thing works.
How to identify Akamai Block?
Most of the Akamai bot blocks result in HTTP status codes 400-500. Most commonly, status code 403 with the message "Pardon Our Interruption" or "Access Denied" is returned. Though to throw off bots Akamai can also return status code 200 with the same messages.
This error is mostly encountered on the first request as Akamai is particularly good at detecting bots at the first stages of the connection. However, Akamai's AI behavior analysis can block connections at any point.
Let's take a look at how exactly Akamai is detecting web scraper and bots next.
How Does Akamai Detect Web Scrapers?
Akamai Bot Manager is using many different web technologies to determine whether a user is a human or a bot. Not only that, but Akamai continuously tracks users' behavior to adjust the detection results also known as the trust score.
The trust score is calculated in many different stages. The final score is then a weighted average of all the stages and determines whether the user is allowed to bypass Akamai.
This complex process is making web scraping difficult as developers have to manage many different factors to bypass Akamai. However, if we take a look at the individual stages we can see that bypassing Akamai is very much possible!
TLS (or SSL) is the first step in the HTTP connection process. It's used in end-to-end encryption of https connections.
To start, both client and the server have to negotiate the encryption method. As there are many different ciphers and encryption options both sides have to agree on the same one. This is where TLS fingerprinting comes into play.
Since different computers, programs and even programming libraries have different TLS capabilities, if a scraper uses a library with different TLS capabilities of a regular web browser it can be identified through this method. This is generally referred to as JA3 fingerprint.
So, if a web scraper is using a library with different TLS capabilities compared to a regular web browser it can be identified through this method.
To avoid being JA3 fingerprinted ensure that the libraries and tools used in HTTP connection are JA3 resistant.
For that, see ScrapFly's JA3 fingerprint web tool that shows your fingerprint.
The next step in Akamai's detection is IP address analysis and fingerprint.
To start, there are a few different types of IP addresses:
Residential are home addresses assigned by internet providers to average people. So, residential IP addresses provide a positive trust score as these are mostly used by humans and are expensive to acquire.
Mobile addresses are assigned by mobile phone towers and mobile users. So, mobile IPs also provide a positive trust score as these are mostly used by humans. In addition, since mobile towers might share and recycle IP addresses it makes it much more difficult to rely on IP addresses for identification.
Datacenter addresses are assigned to various data centers and server platforms like Amazon's AWS, Google Cloud etc. So, datacenter IPs provide a significant negative trust score as they are likely to be used by bots.
Using IP analysis Akamai can determine whether the IP address is residential, mobile or datacenter. This is done by comparing the IP address to a database of known IP addresses and inspecting public IP provider details.
For example, since real users rarely browse from datacenter IPs if web scraper is using one it's a dead giveaway that it's a bot.
So, use high-quality residential or mobile proxies to avoid being blocked by Akamai at this stage.
The next step is the HTTP connection itself. HTTP protocol is becoming more complex and Akamai is using this complexity to detect bots.
To start, most of the web runs on HTTP2 and HTTP3 while many web scraping libraries are using HTTP1.1. So, if a web scraper is using HTTP1.1 it's a clear giveaway that it is a bot.
While many newer HTTP libraries like cURL and httpx support HTTP2 it can still be detected by Akamai using HTTP2 fingerprinting. See ScrapFly's http2 fingerprint test page for more info.
HTTP request headers also play an important role. Akamai is looking for specific headers that are used by web browsers but not by web scrapers. So, it's important to ensure that request headers and their order match that of a real web browser and context of the website.
For example, headers like Origin, Referer can be used in some pages of the website but not in others. Other identity headers like User-Agent and encoding headers like Accept-Encoding can also be used to identify bots.
Harware details and capabilities
Operating system information
Web browser context information
All of this data is used to create a unique fingerprint for tracking users and identifying bots.
Alternatively, we can run a real web browser using browser automation libraries like Selenium, Puppeteer or Playwright that can start a real headless browser and navigate it for web scraping.
This approach can even be mixed with traditional HTTP libraries as we can establish trust score using real web browser and switch session to HTTP library for faster scraping (this feature is also available using Scrapfly sessions)
With all of the above methods bypassed Akamai can still detect bots using behavior analysis. As Akamai is tracking everything that happens on the website it can detect scrapers and bots by detecting abnormal behavior.
So, it's important to distribute web scraper traffic through multiple agents.
This is done by creating multiple profiles with proxies, header details and other settings. If browser automation is used then each profile should use a different browser version and configuration (like screen size etc.).
How to Bypass Akamai Bot Management?
Now that we're familiar with all of the methods being used to detect bots we have a general understanding of how to bypass Akamai bot protection by avoiding all of these detection methods.
There are many ways to approach this challenge but to bypass Akamai in 2023 we can summarize the general approach as follows:
Use high-quality residential or mobile proxies
Patch browser automation libraries with fingerprint resistance patches (like puppeteer-stealth)
Distribute web scraper traffic through multiple agents
Bypass Akamai with ScrapFly
While bypassing Akamai is possible, maintaining the bypass strategies can be very time-consuming. This is where services like ScrapFly web scraping API come in!
Using ScrapFly we can hand over all of the web scraping complexity and bypass logic to ScrapFly!
Scrapfly is not only a Akamai bypasser but also offers many other web scraping features:
from scrapfly import ScrapflyClient, ScrapeConfig
scrapfly = ScrapflyClient(key="YOUR API KEY")
result = scrapfly.scrape(ScrapeConfig(
# and set proxies by country like Japan
# and proxy type like residential:
To wrap this article let's take a look at some frequently asked questions regarding web scraping Akamai pages:
Is it legal to scrape Akamai-protected pages?
Yes. Web scraping publicly available data is perfectly legal around the world as long as the scrapers do not cause damage to the website.
Is it possible to bypass Akamai using cache services?
Yes, public page caching services like Google Cache or Archive.org can be used to bypass Akamai protected pages as Google and Archive tend to be whitelisted. However, since caching takes time the cached page data is often outdated and not suitable for web scraping. Cached pages can also be missing parts of content that are loaded dynamically.
Is it possible to skip Akamai entirely and scrape the real website directly?
This threads closer to security research and it's not advised to partake when web scraping. While scraping and bypassing Akamai pages is perfectly legal abusing security flaws can be illegal in many countries.
In this article, we've taken a look at how to bypass Akamai Bot Management when web scraping.
We've started by identifying all of the ways Akamai is using to develop a trust score for each new connection and the role of this score in web scraping. We've taken a look at each method and what can we do to bypass it.
Finally, we've looked at how to bypass Akamai using ScrapFly web scraping API and how to use ScrapFly to scrape Akamai-protected pages, so give it a shot for free!