How to Bypass Kasada Anti-Bot When Web Scraping in 2024

article feature image

Kasada is a popular web application firewall service used by many websites like Realestate, Hyatt and Scheels. It detects and blocks bots from accessing web, mobile, and API applications.

In this article, we'll explain what Kasada is and how it's used to block bots such as web scrapers. Then, we'll go over techniques and tools to use in order to avoid Kasada scraping blocking. Let's get started!

What is Kasada?

Kasada is a WAF service used to protect websites from malicious activities, such as fraud, spam, and brute forcing. It can detect requests coming from bots by studying and analyzing the configuration differences between bots and normal users' behavior.

Kasada can affect us as web scrapers if the traffic is detected as coming from bots. So, let's have a look at common Kasada detection errors and how they work in action.

How to Identify Kasada Detection?

Unlike other WAF services, Kasada doesn't suspect a group of requests and then identify them by challenging them with CAPTCHAs. Instead, they suspect all requests, even normal users, and challenge them with hidden protection layers located on both the server and client sides. These challenges exhaust bots and bad actors while passing normal users. Leading the firewall to learn from the failed requests' traces to enhance the protection algorithm.

The most common detection errors encountered while getting blocked by Kasada are 400-500 status codes. The 400 errors represent client-side issues, while the 500 represent server-side ones. This reflects the Kasada protection layers implemented on both the client and server sides. The Kasada blocking errors are usually associated with some custom headers used by Kasada, such as the X-Kpsdk-Ct header.

How Does Kasada Detect Web Scrapers?

Before Kasada decides whether the request sender is a bot or not, it uses different techniques to analyze the request fingerprint to calculate a score called the trust score.

fingerprint technologies used by Kasada

The trust score is calculated after going through a few stages. Each stage has a score, and the final score is a weighted average of the previous stages. Kasada decides whether to block or allow the request depending on this final score.

trust score evaluation flow of Akamai anti bot service

This process seems complicated and overwhelming for developers to manage. However, if we look at the details of these stages and implement the best practices in each one, we'll have a high chance of bypassing Kasada bot protection. So, let's go through each stage!

TLS Fingerprinting

The TLS is a protocol used to establish a secure and encrypted HTTPS channel between a client and a web server. Before this channel is initialized, both the client and server have to go through a process called the TLS handshake. During this process, both parties have to negotiate certain values in order to establish the connection. These are:

  • Cipher Suites
    List of encryption algorithms supported by both client and server, ordered by priority. The server and client agree on the first matching value.
  • TLS Versions
    The TLS version used by the client browser, typically it's either 1.2 or 1.3.
  • Enabled Extensions
    List of features the client supports alongside some metadata, such as the server domain name.

The above TLS details are combined together to create a JA3 fingerprint. A string token separated by a - character:

772,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,16-18-13-5-23-45-10-35-65281-27-51-0-11-65037-43-17513,29-23-24,0

To make your request fingerprint appear normal and avoid blocking, ensure that the libraries and tools used while establishing the HTTP connection are JA3-resistant. To do that, you can use the ScrapFly JA3 fingerprint web tool, which identifies and shows your fingerprint.

How TLS Fingerprint is Used to Block Web Scrapers?

Learn how TLS can leak the fact that connecting client is a web scraper and how can it be used to establish fingerprint to track the client across the web.

How TLS Fingerprint is Used to Block Web Scrapers?

IP Address Fingerprinting

The next stage of calculating the trust score is the IP address fingerprinting. Firewalls like Kasada can identify the IP address of the request sender. In which they get information about the client's location, ISP and other related details.

The most viable metric here is the IP address type, which falls into three categories:

  • Residential
    IP addresses are assigned to home networks by ISPs. Residential IPs have a positive trust score as they are most likely used by real users. However, these IPs are few and expensive to acquire.
  • Mobile
    IP addresses are assigned to phones using mobile network towers. Mobile IPs have a positive trust score as they are used by real users too. These IPs are also dynamic that change over a specific period of time, making it hard for firewalls to identify and track them.
  • Datacenter
    IP addresses are assigned by cloud data centers, such as AWS and Google Cloud. Datacenter IPs have a negative trust score as they are likely used by scripts and bots.

To avoid Kasada scraping blocking, you need to hide your IP address by splitting the requests over multiple IP addresses. This will make it harder for the firewall to detect and block your IP address.

How to Avoid Web Scraper IP Blocking?

Learn what are Internet Protocol addresses and how IP tracking technologies are used to block web scrapers.

How to Avoid Web Scraper IP Blocking?

HTTP Details

The next step of the trust calculation process is analyzing the request HTTP details themselves. As the HTTP protocol gets more complex, it has become easier for firewalls to detect connections from bots and web scrapers. Kasada can identify the request as coming from bots by comparing the HTTP version and headers with those used by normal browsers.

  • HTTP version

Most of the web currently runs over HTTP2 or even HTTP3, while many HTTP clients still use HTTP1.1. So, if the HTTP version of the request is HTTP1.1, Kasada will likely detect and block this request.

Some clients support HTTP2, like httpx and cURL, though it's not enabled by default. HTTP2 is also exposed to HTTP2 fingerprinting, which can be used to identify web scrapers. Try out the ScrapFly HTTP2 fingerprint test page for more details.

  • HTTP headers

HTTP headers are key-value pairs used to transfer essential information about the request between the client and server. Many firewalls like Kasada usually look for missing or miss-configured headers like the User-Agent, Referrer and Origin

To avoid getting blocked by Kasada, ensure your requests use HTTP2 and match your headers with those of normal users.

How Headers Are Used to Block Web Scrapers and How to Fix It

See our full introduction article to request headers in the context of web scraper blocking. What do common headers mean, how are they used in web scraper identification and how can we fix this.

How Headers Are Used to Block Web Scrapers and How to Fix It

JavaScript Fingerprinting

The final stage of calculating the trust score is the JavaScript fingerprinting, though it's the most complex step. Kasada analyzes the client's JavaScript for details like:

  • Hardware details and capabilities
  • JavaScript runtime details
  • Web browser infomation
  • Operating system information

The above data are combined together to create a unique fingerprint. This data seems overwhelming to manage. But luckily for us, the JavaScript fingerprint isn't a reliable method and its results aren't taken with a grain of salt by firewalls. So, regardless of this stage score, requests can avoid Kasada blocking if the trust score of the previous stages is high.

To bypass Kasada JavaScript fingerprinting, we can follow two different methods.

  • Reverse engineer the JavaScript fingerprint

By following this step, we can counter the Kasada fingerprint by feeding the client with fake data. However, this method is complex and time-consuming. Moreover, it requires continuous maintenance and updates, as the methods used are constantly changing.

  • Use headless browsers

This approach is much easier and straightforward - simply run headless browsers such as Selenium, Playwright and Puppeteer. However, this method is considered slow, as headless browsers require a lot of resources.

To avoid Kasada scraping blocking from JavaScript fingerprinting, use headless browsers to navigate and scrape the web pages.

How to Scrape Dynamic Websites Using Headless Web Browsers

Learn how to use headless browsers to scrape data from dynamic web pages. What are existing available tools and how to use them? And what are some common challenges, tips and shortcuts when it comes to scraping using web browsers.

How to Scrape Dynamic Websites Using Headless Web Browsers

Behavior Analysis

While following the previous steps and implementing the best practices of each one, Kasada can still detect and block web scraping requests. This is because the detection algorithm learns from the requests and analyzes patterns constantly.

This means that the trust score can be decreased over time. For that, changing the web scraping traffic is necessary to ensure the highest success rate against Kasada bot detection. For example, you need to rotate proxies, User-Agents and header values. The same idea can be applied to headless browsers by changing the browsing profile and capabilities, such as the browser name, version, and screen size.

Bypass Kasada With ScrapFly

We have seen that bypassing Kasada anti-scraping can be challenging with different details that need attention. This is why we created ScrapFly - a web scraping API that allows for scraping at scale by providing:

scrapfly middleware
ScrapFly service does the heavy lifting for you!

Here is how we can bypass Kasada anti-scraping using the ScrapFly. All we have to do is send a request to the target website while enaling the asp feature:

from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse

scrapfly = ScrapflyClient(key="Your ScrapFly API key")

result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
    url="the target website URL",
    # select a the proxy country
    country="us",
    # enable the ASP to bypass any website's blocking
    asp=True,
    # enable JS rendering, similar to headless browsers
    render_js=True,
))

# get the page HTML content
print(result.scrape_result['content'])

Sign-up for FREE to get your API key!

FAQ

To wrap up this guide, let's take a look at some frequently asked questions about bypassing Kasada bot detection.

Yes, as long as the data is public, then it's legal to scrape them. However, you should keep your scraping rate reasonable to avoid damaging the website.

Is it possible to bypass Kasada using Cache services?

Yes, you can use the cached pages provided by public cache services such as Google Cache and Archive.org to bypass Kasada. However, these pages might not always be up-to-date, resulting in scraping obsolete data.

Is it possible to bypass Kasada entirely and scrape the website directly?

This would fall into security flaws and vulnerabilities, which isn't advised to do while scraping as it may lead to legal concerns.

What are other anti-bot service?

There are many WAF services used to protect websites from bots and cyber-attacks, such as Cloudlfare, Akami, Datadome, PerimeterX and Imperva Incapsula. These anti-bots function almost in the same way and the technical aspects described in this article can applied to them too.

Summary

Kasada is an anti-bot. A WAF service that's used to detect and block web scrapers by trapping them with hidden challenges. Kasada detects web scrapers using various techniques, such as HTTP details, TLS, IP, and JavaScript fingerprinting.

We have explained how to bypass Kasada while scraping using different steps. In a nutshell, these are:

  • Use a resistant JA3 fingerprint.
  • Use proxies to hide your IP address.
  • Use headers similar to normal users and enable HTTP2.
  • Use headless browsers to avoid Javascript fingerprinting.

Related Posts

Use Curl Impersonate to scrape as Chrome or Firefox

Learn how to prevent TLS fingerprinting by impersonating normal web browser configurations. We'll start by explaining what the Curl Impersonate is, how it works, how to install and use it. Finally, we'll explore using it with Python to avoid web scraping blocking.

FlareSolverr Guide: Bypass Cloudflare While Scraping

In this article, we'll explore the FlareSolverr tool and how to use it to get around Cloudflare while scraping. We'll start by explaining what FlareSolverr is, how it works, how to install and use it. Let's get started!

How to Bypass CAPTCHA While Web Scraping in 2024

Captchas can ruin web scrapers but we don't have to teach our robots how to solve them - we can just get around it all!