Anti Scraping Protection (ASP)
Service disruption on this feature can occur, regardless of our will. Protection evolves; we need to find and adapt our solution and take days up to weeks to get a production-grade solution. As soon as you use this feature, you must keep it in mind and develop your software by considering it.
It's a technology we developed to bypass anti-scraping protection.
When ASP is triggered, it takes control to resolve, deciding whether to enable or disable JS rendering, allowing the session to solve the captcha, and so forth.
Once the challenge to the protection is resolved, ASP will be triggered each time you revisit the site and inject the correct mechanisms to avoid a new challenge yet again. Therefore, the first request to solve the challenge can take several seconds (regarding the challenge type, from 30 to 120 seconds). Once the first scrape is done, the next will be as fast as usual.
You have nothing to do; our services will automatically manage the ASP, automatically starting it if the captcha or anti-bot solution is detected on the website.
You have nothing to do; our services will automatically manage the ASP, automatically starting it if the captcha or anti-bot solution is detected on the website. We won't play cat and mouse games with anti-bot solutions. We will not explicitly enumerate services we can handle. We pass many solutions, from simple captcha to the most advanced anti-bot solution on the market. We also develop a specific solution for a dedicated popular website. If you want to know more, you can ask us via chat on the screen's bottom-left.
Each time ASP detects and resolves a challenge (captcha or anti-bot solution), a session is created, even if you don't have a session enabled. It ensures all cookies are applied correctly without taking care of them. It will be invisible from your point of view.
Scrapfly ASP auto-resolve many captcha systems automatically
Following captcha system are currently supported:
- Google Recaptcha
We continuously add more captcha providers and update our solution over time.
Scrapfly detects and resolves challenges from well-known anti-scraping solutions on the market. Scrapfly also supports custom solutions on popular websites. Anti-bot bypass are transparent, no extra work on your side. You directly retrieve the successful response.
Keep in mind anti-bot solutions evolve, and we may need to adapt our bypass technics; this is why you should handle ASP errors correctly when relying on ASP.
When ASP is enabled, captcha and anti-bot vendor are automatically detected. Each kind protection have different implementation of
curl -G \ --request "GET" \ --url "https://api.scrapfly.io/scrape" \ --data-urlencode "key=__API_KEY__" \ --data-urlencode "url=https://www.google.com/recaptcha/api2/demo" \ --data-urlencode "asp=true"
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fwww.google.com%2Frecaptcha%2Fapi2%2Fdemo&asp=true" key = "" url = "https://www.google.com/recaptcha/api2/demo" asp = "true"
Example of response
All related errors are listed below. You can see full description and example of error response on the Errors section.
- 520 - ERR::ASP::PROTECTION_FAILED
- 500 - ERR::ASP::CAPTCHA_ERROR
- 520 - ERR::ASP::UNABLE_TO_SOLVE_CAPTCHA
- 408 - ERR::ASP::CAPTCHA_TIMEOUT
- 520 - ERR::ASP::UPSTREAM_UNEXPECTED_RESPONSE
- 520 - ERR::ASP::SHIELD_ERROR
- 419 - ERR::ASP::SHIELD_EXPIRED
- 520 - ERR::ASP::SHIELD_FAILED
- 408 - ERR::ASP::TIMEOUT
Maximize Your Success Rate
Most of the time, datacenter IPs are good enough, but on websites protected by anti-bot vendors, they check the origin of IP if the traffic is coming from a datacenter or regular connection. By residential network, you will get an IP with a better reputation and registered under a regular ASN (which is used to control the origin of the IP). Learn how to change the network type.
Verify Cookies and Headers
OObserve headers/cookies of regular calls that successful; you can figure out if you need to add extra headers or retrieve specific cookies to be auth. You can use the dev tool and inspect the network activity.
You might need to retrieve cookies from navigation before calling unofficial API. The easiest way to achieve that is to scrape by enabling session and rendering JS to retrieve cookies and then you can scrape without rendering js; cookies are now stored in your session and applied back.
Some websites block navigation based on IP location; by default, Scrapfly sselect a random country from the pool, specify the country regarding the location of the website could help. Learn how to do it
Pricing is not easy to predict with Anti Scraping Protection. Everything is designed and optimized to reduce the cost and maximize the reuse of authenticated sessions (which is free even if protection is activated). Be aware protected sites at scale have a real cost; the best way to budget your volume is to take an account and monitor the cost usage while increasing the volume to avoid surprises. We try to be transparent as much as we can on this subject because we know you need to predict and budget the price of the solution; if you have any questions or feedback, please contact us.
Following rules are applied when ASP is activated:
- If ASP does not detect protection, No extra Scrape API calls are counted
- If ASP session already exists and is still valid: No Extra Scrape API calls are counted
- If ASP requires to switch on the residential network, Proxy network cost is counted 25 API calls
- ASP is only billed on success response (404, 410 are not considered as failed)
Therefore, if you are not sure if a website is protected, you can enable ASP. If nothing is blocking, no extra calls are counted.
API Response contains header
X-Scrapfly-Api-Costindicate you the billed amount