Anti Scraping Protection (ASP)
Service disruption on this feature can occur, regardless of our will. Protection evolves; we need to find and adapt our solution and take days up to weeks to get a production-grade solution. As soon as you use this feature, you must keep it in mind and develop your software by considering it.
Unblock protected website, our Anti Scraping Protection is an internal solution to scrape protected website that won't let bots scraping them. It's the addition of many concept to keep a coherent fingerprint and replicate real user fingerprint as close as possible when you scrape a website. If you are interest on how technically it works to be undetected, we have written a series of article
Despite we are able to bypass protection, many usage are prohibited* and any attempt will result in account suspension
* Cybersecurity (red team) firm can be allowed after we got the authorization from the blue side on the domain(s) they approved
- Automated Online Payment
- Account Creation
- Spam Post
- Falsify Vote
- Credit Card Testing
- Login Brute Force
- Referral / Gifting systems
- Ads fraud
- Ticketing (Automated Buying System)
Scrapfly detects and resolves challenges from well-known anti-scraping solutions on the market. Scrapfly also supports custom solutions on popular websites. Anti-bot bypass are transparent, no extra work on your side. You directly retrieve the successful response.
Keep in mind anti-bot solutions evolve, and we may need to adapt our bypass technics; this is why you should handle ASP errors correctly when relying on ASP.
If you plan to target a protected website, you should try many configurations, with or without browser, with residential proxies. If you send
with body, you must configure headers and content type to mimic a real call.
Despite all your attempts, if you are still blocked - you can contact us through the chat to investigate, custom bypass are available from custom plan with a minimum engagement of one year.
When ASP is enabled, anti-bot solution vendor are automatically detected and everything is managed to bypass it.
require "uri" require "net/http" url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.org%2Fanything&tags=project%3Adefault&asp=true") https = Net::HTTP.new(url.host, url.port); https.use_ssl = true request = Net::HTTP::Get.new(url) response = https.request(request) puts response.read_body
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.org%2Fanything&tags=project%3Adefault&asp=true" key = "" url = "https://httpbin.org/anything" tags = "project:default" asp = "true"
ASP is magic but have some limitations
Popular anti bot vendor are bypassed, but they are still some area that we can't cover in fully automated way and still require your attention to configure correctly the calls. Website exposing their internal api publicly or form POST will also protect it (authentication / proof of browser) and the ASP can't guess it / fix it. You need to investigate yourself and do if you do not push any effort in that direction while your scrape (POST/Private API) are rejected - you will stay unsuccessful. The most complicated part is already done for you (defeat anti bot), if you are in this case, here is the last mile:
Posting form require to pass correct headers and information like the website does. Most of the time headers
Content-Type require special attention.
The best way to replicate correctly headers is to inspect the call with the dev tools and trigger the call from your browser and inspect related resources.
Website Private API
Basically, same as for
POST request. Most of the time private API requires to be authenticated - if basic cookies from a normal user navigation are not enough you might
need to reverse the process to figured out how it's authenticated in order to retrieve the token and pass along your scrape to be authorized
Maximize Your Success Rate
Most of the time, datacenter IPs are good enough, but on websites protected by anti-bot vendors, they check the origin of IP if the traffic is coming from a datacenter or regular connection. By residential network, you will get an IP with a better reputation and registered under a regular ASN (which is used to control the origin of the IP).
- Introduction To Proxies in Web Scraping
- How to Avoid Web Scraping Blocking: IP Address Guide
- Learn how to change the network type
proxy_pool=public_residential_pool, checkout the related documentation
Use a Browser
render_js=true, checkout the related documentation
Verify Cookies and Headers
Observe headers/cookies of regular calls that are successful; you can figure out if you need to add extra headers or retrieve specific cookies to authenticate. You can use the dev tool and inspect the network activity.
headers[referer]=https%3A%2F%2Fexample.com(value is url encoded), checkout the related documentation
You might need to retrieve cookies from navigation before calling unofficial API. The easiest way to achieve that is to scrape by enabling session and rendering JS to retrieve cookies, then you can scrape without rendering js; cookies are now stored in your session and applied back.
Some websites block navigation based on IP location; by default, Scrapfly select a random country from the pool, specify the country regarding the location of the website could help.
country=us, checkout the related documentation
Pricing is not easy to predict with Anti Scraping Protection. Everything is designed and optimized to reduce the cost and maximize the reuse of authenticated sessions (which is free even if protection is activated). Be aware protected sites at scale have a real cost; the best way to budget your volume is to take an account and monitor the cost usage while increasing the volume to avoid surprises. We try to be transparent as much as we can on this subject because we know you need to predict and budget the price of the solution; if you have any questions or feedback, please contact us.
|Scenario||API Call Cost|
|ASP + Residential Proxies*||25|
|ASP + Residential Proxies + Browser*||25 + 5 = 30|
|ASP + Datacenter Proxies||1|
|ASP + Datacenter Proxies + Browser||6|
Failed request >= 400 are not billed except the following:
400, 401, 404, 405, 406, 407, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 422, 424, 426, 428, 456.
To prevent any abuse, this is subject to our fair use policy, if more than 30% of the traffic with previous http code is reached - the fair use is disabled and failed
request are billed.
If your account fall under 60% of success rate and/or you deliberately scrape protected website without ASP or failed target, your account will be suspended.
Therefore, if you are not sure if a website is protected, you can enable ASP. If nothing is blocking, no extra calls are counted.
You can try to target the desired website through our API player by creating a free account.
API Response contains header
X-Scrapfly-Api-Costindicate you the billed amount and
X-Scrapfly-Remaining-Api-Creditindicate the remaining amount of Scrape API Call
Some specific protected domain have a special price due to high protection, you can ask through support or test via player to show how much credit are billed
All related errors are listed below. You can see full description and example of error response on the Errors section.
- ERR::ASP::CAPTCHA_ERROR - Something wrong happened with the captcha. We will figure out to fix the problem as soon as possible
- ERR::ASP::CAPTCHA_TIMEOUT - The budgeted time to solve the captcha is reached
- ERR::ASP::PROTECTION_FAILED - The attempt to solved or bypass the bot protection failed for this time - Unfortunately it happened sometimes and you should retry this error if it's sporadic. If this issue always happened - check your config and ask support
- ERR::ASP::SHIELD_ERROR - The ASP encounter an unexpected problem. We will fix it as soon as possible. Our team has been alerted
- ERR::ASP::SHIELD_EXPIRED - The ASP shield previously set is expired, you must retry.
- ERR::ASP::SHIELD_PROTECTION_FAILED - The ASP shield failed to solve the challenge against the anti scrapping protection
- ERR::ASP::TIMEOUT - The ASP made too much time to solve or respond
- ERR::ASP::UNABLE_TO_SOLVE_CAPTCHA - Despite our effort, we were unable to solve the captcha. It can happened sporadically, please retry
- ERR::ASP::UPSTREAM_UNEXPECTED_RESPONSE - The response given by the upstream after challenge resolution is not expected. Our team has been alerted