Alternatives to Cloudscraper to Bypass Cloudflare
Learn why Cloudscraper is outdated and explore modern alternatives for bypassing Cloudflare protections effectively and ethically.
Kasada is a popular web application firewall service used by many websites like Realestate, Hyatt and Scheels. It detects and blocks bots from accessing web, mobile, and API applications.
In this article, we'll explain what Kasada is and how it's used to block bots such as web scrapers. Then, we'll go over techniques and tools to use in order to avoid Kasada scraping blocking. Let's get started!
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:
Scrapfly does not offer legal advice but these are good general rules to follow in web scraping
and for more you should consult a lawyer.
Kasada is a WAF service used to protect websites from malicious activities, such as fraud, spam, and brute forcing. It can detect requests coming from bots by studying and analyzing the configuration differences between bots and normal users' behavior.
Kasada can affect us as web scrapers if the traffic is detected as coming from bots. So, let's have a look at common Kasada detection errors and how they work in action.
Unlike other WAF services, Kasada doesn't suspect a group of requests and then identify them by challenging them with CAPTCHAs. Instead, they suspect all requests, even normal users, and challenge them with hidden protection layers located on both the server and client sides. These challenges exhaust bots and bad actors while passing normal users. Leading the firewall to learn from the failed requests' traces to enhance the protection algorithm.
The most common detection errors encountered while getting blocked by Kasada are 400-500 status codes. The 400 errors represent client-side issues, while the 500 represent server-side ones. This reflects the Kasada protection layers implemented on both the client and server sides. The Kasada blocking errors are usually associated with some custom headers used by Kasada, such as the X-Kpsdk-Ct
header.
Before Kasada decides whether the request sender is a bot or not, it uses different techniques to analyze the request fingerprint to calculate a score called the trust score.
The trust score is calculated after going through a few stages. Each stage has a score, and the final score is a weighted average of the previous stages. Kasada decides whether to block or allow the request depending on this final score.
This process seems complicated and overwhelming for developers to manage. However, if we look at the details of these stages and implement the best practices in each one, we'll have a high chance of bypassing Kasada bot protection. So, let's go through each stage!
The TLS is a protocol used to establish a secure and encrypted HTTPS channel between a client and a web server. Before this channel is initialized, both the client and server have to go through a process called the TLS handshake. During this process, both parties have to negotiate certain values in order to establish the connection. These are:
The above TLS details are combined together to create a JA3 fingerprint. A string token separated by a -
character:
772,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,16-18-13-5-23-45-10-35-65281-27-51-0-11-65037-43-17513,29-23-24,0
To make your request fingerprint appear normal and avoid blocking, ensure that the libraries and tools used while establishing the HTTP connection are JA3-resistant. To do that, you can use the ScrapFly JA3 fingerprint web tool, which identifies and shows your fingerprint.
The next stage of calculating the trust score is the IP address fingerprinting. Firewalls like Kasada can identify the IP address of the request sender. In which they get information about the client's location, ISP and other related details.
The most viable metric here is the IP address type, which falls into three categories:
To avoid Kasada scraping blocking, you need to hide your IP address by splitting the requests over multiple IP addresses. This will make it harder for the firewall to detect and block your IP address.
The next step of the trust calculation process is analyzing the request HTTP details themselves. As the HTTP protocol gets more complex, it has become easier for firewalls to detect connections from bots and web scrapers. Kasada can identify the request as coming from bots by comparing the HTTP version and headers with those used by normal browsers.
Most of the web currently runs over HTTP2 or even HTTP3, while many HTTP clients still use HTTP1.1. So, if the HTTP version of the request is HTTP1.1, Kasada will likely detect and block this request.
Some clients support HTTP2, like httpx and cURL, though it's not enabled by default. HTTP2 is also exposed to HTTP2 fingerprinting, which can be used to identify web scrapers. Try out the ScrapFly HTTP2 fingerprint test page for more details.
HTTP headers are key-value pairs used to transfer essential information about the request between the client and server. Many firewalls like Kasada usually look for missing or miss-configured headers like the User-Agent
, Referrer
and Origin
To avoid getting blocked by Kasada, ensure your requests use HTTP2 and match your headers with those of normal users.
The final stage of calculating the trust score is the JavaScript fingerprinting, though it's the most complex step. Kasada analyzes the client's JavaScript for details like:
The above data are combined together to create a unique fingerprint. This data seems overwhelming to manage. But luckily for us, the JavaScript fingerprint isn't a reliable method and its results aren't taken with a grain of salt by firewalls. So, regardless of this stage score, requests can avoid Kasada blocking if the trust score of the previous stages is high.
To bypass Kasada JavaScript fingerprinting, we can follow two different methods.
By following this step, we can counter the Kasada fingerprint by feeding the client with fake data. However, this method is complex and time-consuming. Moreover, it requires continuous maintenance and updates, as the methods used are constantly changing.
This approach is much easier and straightforward - simply run headless browsers such as Selenium, Playwright and Puppeteer. However, this method is considered slow, as headless browsers require a lot of resources.
To avoid Kasada scraping blocking from JavaScript fingerprinting, use headless browsers to navigate and scrape the web pages.
While following the previous steps and implementing the best practices of each one, Kasada can still detect and block web scraping requests. This is because the detection algorithm learns from the requests and analyzes patterns constantly.
This means that the trust score can be decreased over time. For that, changing the web scraping traffic is necessary to ensure the highest success rate against Kasada bot detection. For example, you need to rotate proxies, User-Agents and header values. The same idea can be applied to headless browsers by changing the browsing profile and capabilities, such as the browser name, version, and screen size.
Bypassing Kasada anti-bot while possible is very difficult - let Scrapfly do it for you!
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale. Each product is equipped with an automatic bypass for any anti-bot system and we achieve this by:
It takes Scrapfly several full-time engineers to maintain this system, so you don't have to!
For example, to scrape pages protected by Kasada using ScrapFly's Python SDK all we need to do is enable the Anti Scraping Protection bypass feature:
from scrapfly import ScrapflyClient, ScrapeConfig, ScrapeApiResponse
scrapfly = ScrapflyClient(key="Your ScrapFly API key")
result: ScrapeApiResponse = scrapfly.scrape(ScrapeConfig(
url="the target website URL",
# select a the proxy country
country="us",
# enable the ASP to bypass any website's blocking
asp=True,
# enable JS rendering, similar to headless browsers
render_js=True,
))
# get the page HTML content
print(result.scrape_result['content'])
To wrap up this guide, let's take a look at some frequently asked questions about bypassing Kasada bot detection.
Yes, as long as the data is public, then it's legal to scrape them. However, you should keep your scraping rate reasonable to avoid damaging the website.
Yes, you can use the cached pages provided by public cache services such as Google Cache and Archive.org to bypass Kasada. However, these pages might not always be up-to-date, resulting in scraping obsolete data.
This would fall into security flaws and vulnerabilities, which isn't advised to do while scraping as it may lead to legal concerns.
There are many WAF services used to protect websites from bots and cyber-attacks, such as Cloudlfare, Akami, Datadome, PerimeterX and Imperva Incapsula. These anti-bots function almost in the same way and the technical aspects described in this article can applied to them too.
Kasada is an anti-bot. A WAF service that's used to detect and block web scrapers by trapping them with hidden challenges. Kasada detects web scrapers using various techniques, such as HTTP details, TLS, IP, and JavaScript fingerprinting.
We have explained how to bypass Kasada while scraping using different steps. In a nutshell, these are: