Kasada is a popular web application firewall service used by many websites like Realestate, Hyatt and Scheels. It detects and blocks bots from accessing web, mobile, and API applications.
In this article, we'll explain what Kasada is and how it's used to block bots such as web scrapers. Then, we'll go over techniques and tools to use in order to avoid Kasada scraping blocking. Let's get started!
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:
Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.
Scrapfly does not offer legal advice but these are good general rules to follow in web scraping
and for more you should consult a lawyer.
What is Kasada?
Kasada is a WAF service used to protect websites from malicious activities, such as fraud, spam, and brute forcing. It can detect requests coming from bots by studying and analyzing the configuration differences between bots and normal users' behavior.
Kasada can affect us as web scrapers if the traffic is detected as coming from bots. So, let's have a look at common Kasada detection errors and how they work in action.
How to Identify Kasada Detection?
Unlike other WAF services, Kasada doesn't suspect a group of requests and then identify them by challenging them with CAPTCHAs. Instead, they suspect all requests, even normal users, and challenge them with hidden protection layers located on both the server and client sides. These challenges exhaust bots and bad actors while passing normal users. Leading the firewall to learn from the failed requests' traces to enhance the protection algorithm.
The most common detection errors encountered while getting blocked by Kasada are 400-500 status codes. The 400 errors represent client-side issues, while the 500 represent server-side ones. This reflects the Kasada protection layers implemented on both the client and server sides. The Kasada blocking errors are usually associated with some custom headers used by Kasada, such as the X-Kpsdk-Ct header.
How Does Kasada Detect Web Scrapers?
Before Kasada decides whether the request sender is a bot or not, it uses different techniques to analyze the request fingerprint to calculate a score called the trust score.
The trust score is calculated after going through a few stages. Each stage has a score, and the final score is a weighted average of the previous stages. Kasada decides whether to block or allow the request depending on this final score.
This process seems complicated and overwhelming for developers to manage. However, if we look at the details of these stages and implement the best practices in each one, we'll have a high chance of bypassing Kasada bot protection. So, let's go through each stage!
TLS Fingerprinting
The TLS is a protocol used to establish a secure and encrypted HTTPS channel between a client and a web server. Before this channel is initialized, both the client and server have to go through a process called the TLS handshake. During this process, both parties have to negotiate certain values in order to establish the connection. These are:
Cipher Suites
List of encryption algorithms supported by both client and server, ordered by priority. The server and client agree on the first matching value.
TLS Versions
The TLS version used by the client browser, typically it's either 1.2 or 1.3.
Enabled Extensions
List of features the client supports alongside some metadata, such as the server domain name.
The above TLS details are combined together to create a JA3 fingerprint. A string token separated by a - character:
To make your request fingerprint appear normal and avoid blocking, ensure that the libraries and tools used while establishing the HTTP connection are JA3-resistant. To do that, you can use the ScrapFly JA3 fingerprint web tool, which identifies and shows your fingerprint.
IP Address Fingerprinting
The next stage of calculating the trust score is the IP address fingerprinting. Firewalls like Kasada can identify the IP address of the request sender. In which they get information about the client's location, ISP and other related details.
The most viable metric here is the IP address type, which falls into three categories:
Residential
IP addresses are assigned to home networks by ISPs. Residential IPs have a positive trust score as they are most likely used by real users. However, these IPs are few and expensive to acquire.
Mobile
IP addresses are assigned to phones using mobile network towers. Mobile IPs have a positive trust score as they are used by real users too. These IPs are also dynamic that change over a specific period of time, making it hard for firewalls to identify and track them.
Datacenter
IP addresses are assigned by cloud data centers, such as AWS and Google Cloud. Datacenter IPs have a negative trust score as they are likely used by scripts and bots.
To avoid Kasada scraping blocking, you need to hide your IP address by splitting the requests over multiple IP addresses. This will make it harder for the firewall to detect and block your IP address.
HTTP Details
The next step of the trust calculation process is analyzing the request HTTP details themselves. As the HTTP protocol gets more complex, it has become easier for firewalls to detect connections from bots and web scrapers. Kasada can identify the request as coming from bots by comparing the HTTP version and headers with those used by normal browsers.
HTTP version
Most of the web currently runs over HTTP2 or even HTTP3, while many HTTP clients still use HTTP1.1. So, if the HTTP version of the request is HTTP1.1, Kasada will likely detect and block this request.
Some clients support HTTP2, like httpx and cURL, though it's not enabled by default. HTTP2 is also exposed to HTTP2 fingerprinting, which can be used to identify web scrapers. Try out the ScrapFly HTTP2 fingerprint test page for more details.
HTTP headers
HTTP headers are key-value pairs used to transfer essential information about the request between the client and server. Many firewalls like Kasada usually look for missing or miss-configured headers like the User-Agent, Referrer and Origin
To avoid getting blocked by Kasada, ensure your requests use HTTP2 and match your headers with those of normal users.
JavaScript Fingerprinting
The final stage of calculating the trust score is the JavaScript fingerprinting, though it's the most complex step. Kasada analyzes the client's JavaScript for details like:
Hardware details and capabilities
JavaScript runtime details
Web browser infomation
Operating system information
The above data are combined together to create a unique fingerprint. This data seems overwhelming to manage. But luckily for us, the JavaScript fingerprint isn't a reliable method and its results aren't taken with a grain of salt by firewalls. So, regardless of this stage score, requests can avoid Kasada blocking if the trust score of the previous stages is high.
To bypass Kasada JavaScript fingerprinting, we can follow two different methods.
Reverse engineer the JavaScript fingerprint
By following this step, we can counter the Kasada fingerprint by feeding the client with fake data. However, this method is complex and time-consuming. Moreover, it requires continuous maintenance and updates, as the methods used are constantly changing.
Use headless browsers
This approach is much easier and straightforward - simply run headless browsers such as Selenium, Playwright and Puppeteer. However, this method is considered slow, as headless browsers require a lot of resources.
To avoid Kasada scraping blocking from JavaScript fingerprinting, use headless browsers to navigate and scrape the web pages.
Behavior Analysis
While following the previous steps and implementing the best practices of each one, Kasada can still detect and block web scraping requests. This is because the detection algorithm learns from the requests and analyzes patterns constantly.
This means that the trust score can be decreased over time. For that, changing the web scraping traffic is necessary to ensure the highest success rate against Kasada bot detection. For example, you need to rotate proxies, User-Agents and header values. The same idea can be applied to headless browsers by changing the browsing profile and capabilities, such as the browser name, version, and screen size.
Bypass Kasada With ScrapFly
Bypassing Kasada anti-bot while possible is very difficult - let Scrapfly do it for you!
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale. Each product is equipped with an automatic bypass for any anti-bot system and we achieve this by:
Maintaining a fleet of real, reinforced web browsers with real fingerprint profiles.
Millions of self-healing proxies of the highest possible trust score.
Constantly evolving and adapting to new anti-bot systems.
We've been doing this publicly since 2020 with the best bypass on the market!
It takes Scrapfly several full-time engineers to maintain this system, so you don't have to!
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
scrapfly = ScrapflyClient(key="Your ScrapFly API key")
result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
url="the target website URL",
# select a the proxy country
country="us",
# enable the ASP to bypass any website's blocking
asp=True,
# enable JS rendering, similar to headless browsers
render_js=True,
))
# get the page HTML content
print(result.scrape_result['content'])
FAQ
To wrap up this guide, let's take a look at some frequently asked questions about bypassing Kasada bot detection.
Is it legal to scrape Kasada-protected pages?
Yes, as long as the data is public, then it's legal to scrape them. However, you should keep your scraping rate reasonable to avoid damaging the website.
Is it possible to bypass Kasada using Cache services?
Yes, you can use the cached pages provided by public cache services such as Google Cache and Archive.org to bypass Kasada. However, these pages might not always be up-to-date, resulting in scraping obsolete data.
Is it possible to bypass Kasada entirely and scrape the website directly?
This would fall into security flaws and vulnerabilities, which isn't advised to do while scraping as it may lead to legal concerns.
What are other anti-bot service?
There are many WAF services used to protect websites from bots and cyber-attacks, such as Cloudlfare, Akami, Datadome, PerimeterX and Imperva Incapsula. These anti-bots function almost in the same way and the technical aspects described in this article can applied to them too.
Summary
Kasada is an anti-bot. A WAF service that's used to detect and block web scrapers by trapping them with hidden challenges. Kasada detects web scrapers using various techniques, such as HTTP details, TLS, IP, and JavaScript fingerprinting.
We have explained how to bypass Kasada while scraping using different steps. In a nutshell, these are:
Use a resistant JA3 fingerprint.
Use proxies to hide your IP address.
Use headers similar to normal users and enable HTTP2.
Use headless browsers to avoid Javascript fingerprinting.
Learn how to prevent TLS fingerprinting by impersonating normal web browser configurations. We'll start by explaining what the Curl Impersonate is, how it works, how to install and use it. Finally, we'll explore using it with Python to avoid web scraping blocking.