Anti Scraping Protection (ASP)
Service interruptions may occasionally occur, independent of our control. As technology progresses, we continuously strive to identify and adjust our solutions, which may take several days to weeks to implement a reliable production-grade remedy. It is essential to bear this in mind and develop your software accordingly when utilizing this feature.
Please note that we do not offer estimated time of arrival (ETA) or guarantees regarding workarounds. Our dedicated team is diligently working towards finding solutions, but it is impractical for us to provide such information in advance.
Our Anti-Scraping Protection is designed to unblock protected websites that are inaccessible to bots. We accomplish this by incorporating various concepts that help maintain a coherent fingerprint, making it as close to that of a real user as possible when scraping a website. If you are interested in understanding the technical aspects of how we achieve this undetectability, we have published a series of articles on the subject available in the learning resources section below.
It is important to note that while our anti-scraping protection solution is able to bypass protection measures, there are certain actions that are prohibited and will result in the suspension of your account. These actions include:
- Automated Online Payment
- Account Creation
- Spam Post
- Falsify Vote
- Credit Card Testing
- Login Brute Force
- Referral / Gifting systems
- Ads fraud
- Ticketing (Automated Buying System)
- Betting, Casino, Gambling
Please be aware that our solution may be authorized for use by cybersecurity firms (red teams) only after obtaining approval from the relevant parties for the specific domains they wish to test.
Scrapfly is capable of identifying and resolving obstacles posed by commonly used anti-scraping measures. Our platform also provides support for custom anti-scraping measures implemented by popular websites. Our anti-bot bypasses do not require any extra input from you, and you will receive successful responses directly.
It's important to keep in mind that anti-bot solutions frequently change and evolve, so our bypass techniques may need to be adjusted accordingly. It's recommended that you handle ASP errors correctly when depending on ASP.
If you're attempting to scrape a protected website, it's advisable to try multiple configurations, with or without a browser, and to use residential proxies. If you're sending a POST request with a body, you must configure the headers and content type to mimic a genuine call.
If you're still experiencing issues despite all your efforts, please contact us via chat to explore further options. Custom bypasses are available through custom plans.
When ASP is enabled, anti-bot solution vendor are automatically detected and everything is managed to bypass it.
curl -G \ --request "GET" \ --url "https://api.scrapfly.io/scrape" \ --data-urlencode "key=__API_KEY__" \ --data-urlencode "url=https://httpbin.dev/anything" \ --data-urlencode "tags=project:default" \ --data-urlencode "asp=true"
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.dev%2Fanything&tags=project%3Adefault&asp=true" "api.scrapfly.io" "/scrape" key = "" url = "https://httpbin.dev/anything" tags = "project:default" asp = "true"
ASP is magic but have some limitations
While popular anti-bot vendors can be bypassed, there are still some areas that require manual configuration of calls. Websites that expose their internal APIs publicly or use form POST with authentication or proof of browser cannot be guessed or fixed by anti-bot solutions. If you want to be successful in scraping such websites, you need to investigate and put in effort to configure your calls correctly. The most difficult part of defeating anti-bot measures has already been done for you, but you need to take the last mile by configuring your calls correctly.
How to avoid anti bot detection on POST request
Avoiding anti-bot detection on a POST request can be tricky, but there are some key areas to focus on:
Mimic a real user's behavior: Anti-bot systems often check for unusual behavior that may indicate a bot, such as a high number of requests from the same IP address or at the same time. You can mimic a real user's behavior by visiting some pages to retrieve navigation cookies/referers urls.
Handling CSRF: Cross-Site Request Forgery (CSRF) is a common anti-bot measure used by websites. To bypass it, you need to make sure you include the correct CSRF token in your POST request. You can usually find this token by inspecting the page source code or network traffic. Once you have the token, you can include it in your request headers.
For a deeper look, you can checkout our dedicated blog post about CSRF.
Use realistic headers: Anti-bot systems can detect bots by looking at the headers of the requests. You should try to replicate the headers of a real user's request as closely as possible. This includes the
Originheaders. Make sure to configure correctly the value of
Content-Typeregarding the content you expect (json, html).
For a deeper look, you can checkout our dedicated blog post about headers.
- Authentication: If the website requires authentication, make sure you include the correct credentials in your request. This might involve logging in to the website first, then including the session cookie or token in your POST request. If the API/Website requires it, ASP is not able to manage this, you must handle it on your side.
Overall, the key to bypassing anti-bot measures on a POST request is to replicate the headers, cookies, and authentication of a regular browser request as closely as possible. This requires careful inspection of the website's code and network traffic to identify the required elements.
Website with Private/Hidden API
Scraping a private API can be a bit more challenging than scraping public APIs. Here are some recommendations to follow:
- Make sure you have permission: Before scraping any private API, make sure you have the necessary permission from the website owner or API provider. Scraping a private API without permission can result in legal consequences regarding the type of data exposed.
- Mimic a real user: When scraping a private API, it's important to mimic a real user as closely as possible. This means sending the same headers and parameters that a real user would send when accessing the API.
- Use authentication: Most private APIs require some form of authentication, such as a token or API key. Make sure you obtain the necessary credentials and use them in your requests.
- Monitor for changes: Private APIs can change over time, so it's important to monitor for any changes in the API's structure or authentication requirements. If you notice any changes, update your scraping code accordingly.
Overall, scraping private APIs requires more attention to detail and careful configuration of requests. Following these recommendations can help ensure a successful and ethical scraping process.
Maximize Your Success Rate
In many cases, datacenter IPs are sufficient. However, anti-bot vendors may check the origin of the IP when protecting websites, to determine if the traffic comes from a datacenter or a regular connection. In such cases, residential networks can provide a better IP reputation, as they are registered under a regular ASN that helps control the origin of the IP.
- Introduction To Proxies in Web Scraping
- How to Avoid Web Scraping Blocking: IP Address Guide
- Learn how to change the network type
proxy_pool=public_residential_pool, checkout the related documentation
Use a Browser
render_js=true, checkout the related documentation
Verify Cookies and Headers
Observe headers/cookies of regular calls that are successful; you can figure out if you need to add extra headers or retrieve specific cookies to authenticate. You can use the dev tool and inspect the network activity.
headers[referer]=https%3A%2F%2Fexample.com(value is url encoded), checkout the related documentation
session=my-unique-session-name, checkout the related documentation
render_js=true, checkout the related documentation
When browsing certain websites, users may encounter blocks based on their IP location. Scrapfly can bypass this issue by default, as it selects a random country from its pool. However, specifying the country based on the location of the website can be a helpful way to avoid geo-blocking.
country=us, checkout the related documentation
Our Anti-Scraping Protection (ASP) solution is a sophisticated tool that provides advanced protection against scraping attempts. It is designed to adapt to various anti-scraping measures implemented on different websites. To achieve this, the ASP dynamically mutates your configuration parameters based on the target and the anti-scraping solution in place, and this can have an impact on pricing.
The ASP can change your configuration parameters such as
This dynamic mutation can lead to pricing variations, particularly due to the usage of browsers and residential networks.
Therefore, scraping protected websites at scale can be costly. You can check our pricing grid
To ensure predictability and control of your spending, we recommend creating an account and gradually monitoring the usage costs as you increase your volume. Once you have determined the actual cost, you can check our set of tools to make it more predictible and ensure staying within budget.
It is essential to note that if no blocking solution is detected, there is no extra charge. Furthermore, our ASP solution is designed to lower the cost and improve response time, making it an advanced and efficient solution for your scraping needs.
Some specific protected domain have a special price due to high protection, you can ask through support or test via player to show how much credit are billed
All related errors are listed below. You can see full description and example of error response on the Errors section.
- ERR::ASP::CAPTCHA_ERROR - Something wrong happened with the captcha. We will figure out to fix the problem as soon as possible
- ERR::ASP::CAPTCHA_TIMEOUT - The budgeted time to solve the captcha is reached
- ERR::ASP::SHIELD_ERROR - The ASP encounter an unexpected problem. We will fix it as soon as possible. Our team has been alerted
- ERR::ASP::SHIELD_EXPIRED - The ASP shield previously set is expired, you must retry.
- ERR::ASP::SHIELD_PROTECTION_FAILED - The ASP shield failed to solve the challenge against the anti scrapping protection
- ERR::ASP::TIMEOUT - The ASP made too much time to solve or respond
- ERR::ASP::UNABLE_TO_SOLVE_CAPTCHA - Despite our effort, we were unable to solve the captcha. It can happened sporadically, please retry
- ERR::ASP::UPSTREAM_UNEXPECTED_RESPONSE - The response given by the upstream after challenge resolution is not expected. Our team has been alerted