Anti Scraping Protection (ASP)

overview page of web interface
ASP tab of log inspection
Service disruption on this feature can occur, regardless of our will. Protection evolves; we need to find and adapt our solution and take days up to weeks to get a production-grade solution. As soon as you use this feature, you must keep it in mind and develop your software by considering it.

Anti Bot

Unblock protected website, our Anti Scraping Protection is an internal solution to scrape protected website that won't let bots scraping them. It's the addition of many concept to keep a coherent fingerprint and replicate real user fingerprint as close as possible when you scrape a website. If you are interest on how technically it works to be undetected, we have written a series of article

Despite we are able to bypass protection, many usage are prohibited* and any attempt will result in account suspension
  • Automated Online Payment
  • Account Creation
  • Spam Post
  • Falsify Vote
  • Credit Card Testing
  • Login Brute Force
  • Referral / Gifting systems
  • Ads fraud
  • Banks
  • Ticketing (Automated Buying System)
  • Betting, Casino, Gambling
* Cybersecurity (red team) firm can be allowed after we got the authorization from the blue side on the domain(s) they approved

Scrapfly detects and resolves challenges from well-known anti-scraping solutions on the market. Scrapfly also supports custom solutions on popular websites. Anti-bot bypass are transparent, no extra work on your side. You directly retrieve the successful response.

Keep in mind anti-bot solutions evolve, and we may need to adapt our bypass technics; this is why you should handle ASP errors correctly when relying on ASP.

If you plan to target a protected website, you should try many configurations, with or without browser, with residential proxies. If you send POST request with body, you must configure headers and content type to mimic a real call.

Despite all your attempts, if you are still blocked - you can contact us through the chat to investigate, custom bypass are available from custom plan with a minimum engagement of one year.

Usage

When ASP is enabled, anti-bot solution vendor are automatically detected and everything is managed to bypass it.

curl -G \
--request "GET" \
--url "https://api.scrapfly.io/scrape" \
--data-urlencode "key=__API_KEY__" \
--data-urlencode "url=https://httpbin.dev/anything" \
--data-urlencode "tags=project:default" \
--data-urlencode "proxy_pool=public_datacenter" \
--data-urlencode "asp=true"
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.dev%2Fanything&tags=project%3Adefault&proxy_pool=public_datacenter&asp=true"

"api.scrapfly.io"
"/scrape"

key         = "" 
url         = "https://httpbin.dev/anything" 
tags        = "project:default" 
proxy_pool  = "public_datacenter" 
asp         = "true" 

ASP is magic but have some limitations

Popular anti bot vendor are bypassed, but they are still some area that we can't cover in fully automated way and still require your attention to configure correctly the calls. Website exposing their internal api publicly or form POST will also protect it (authentication / proof of browser) and the ASP can't guess it / fix it. You need to investigate yourself and do if you do not push any effort in that direction while your scrape (POST/Private API) are rejected - you will stay unsuccessful. The most complicated part is already done for you (defeat anti bot), if you are in this case, here is the last mile:

POST form

Posting form require to pass correct headers and information like the website does. Most of the time headers Accept, Content-Type require special attention. The best way to replicate correctly headers is to inspect the call with the dev tools and trigger the call from your browser and inspect related resources.

Website Private API

Basically, same as for POST request. Most of the time private API requires to be authenticated - if basic cookies from a normal user navigation are not enough you might need to reverse the process to figured out how it's authenticated in order to retrieve the token and pass along your scrape to be authorized

Maximize Your Success Rate

Network Quality

Most of the time, datacenter IPs are good enough, but on websites protected by anti-bot vendors, they check the origin of IP if the traffic is coming from a datacenter or regular connection. By residential network, you will get an IP with a better reputation and registered under a regular ASN (which is used to control the origin of the IP).

API Usage: proxy_pool=public_residential_pool, checkout the related documentation

Use a Browser

Most of anti bot check the browser fingerprint / javascript engine to generate a proof of legitimacy.

API Usage: render_js=true, checkout the related documentation

Verify Cookies and Headers

Observe headers/cookies of regular calls that are successful; you can figure out if you need to add extra headers or retrieve specific cookies to authenticate. You can use the dev tool and inspect the network activity.

API Usage: headers[referer]=https%3A%2F%2Fexample.com (value is url encoded), checkout the related documentation

Navigation Coherence

You might need to retrieve cookies from navigation before calling unofficial API. The easiest way to achieve that is to scrape by enabling session and rendering JS to retrieve cookies, then you can scrape without rendering js; cookies are now stored in your session and applied back.

Geo Blocking

Some websites block navigation based on IP location; by default, Scrapfly select a random country from the pool, specify the country regarding the location of the website could help.

API Usage: country=us, checkout the related documentation

Pricing

Pricing is not easy to predict with Anti Scraping Protection. Everything is designed and optimized to reduce the cost and maximize the reuse of authenticated sessions (which is free even if protection is activated).

Be aware protected sites at scale have a real cost; the best way to budget your volume is to take an account and monitor the cost usage while increasing the volume to avoid surprises. We try to be transparent as much as we can on this subject because we know you need to predict and budget the price of the solution; if you have any questions or feedback, please contact us.

Pricing Grid

*Most of protected website require residential and browser to pass.
Scenario API Credits Cost
ASP + Residential Proxies* 25
ASP + Residential Proxies + Browser* 25 + 5 = 30
ASP + Datacenter Proxies 1
ASP + Datacenter Proxies + Browser 6

Pricing Predictability and control of your spending

Since the ASP dynamically updates configuration to be able to pass protection, the price is dynamic for the following reason:

  • On the first try, always respect your configuration.
  • When the first try at low cost failed, we upgraded regarding the protection (browser, proxies quality) - Most of the well-known anti-bot require a browser.
  • Some targets/shields have special fees due resources required to pass.
  • We can optimize and reduce the cost when we reuse the bypass and avoid the challenge.
  • Something no protection is triggered if no protection/block is involved; you don't pay any extra.

To help you to make your cost more predictable the following options are available:

  • Project: Allow or disallow extra quota, extra usage spending limit and concurrency limit
  • Throttler: Define per target speed limit (request rate, concurrency), API Credit budget for period (hour, day, month)
  • API: Using cost_budget parameter to define the maximum budget the ASP should respect. If a web scrape and the budget interrupt configuration mutation, the web scrape performed is billed regardless of the status code. Make sure to define the correct minimum budget regarding your target, if the budget is too low, you will never be able to pass and pay for blocked result.

Therefore, you can enable ASP if you are unsure if a website is protected. If nothing is blocking, no extra calls count.

Success Rate and Fair use

Failed request >= 400 are not billed except the following: 400, 401, 404, 405, 406, 407, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 422, 424, 426, 428, 456. To prevent any abuse, this is subject to our fair use policy, if more than 30% of the traffic with previous http code is reached - the fair use is disabled and failed request are billed.

If your account fall under 60% of success rate and/or you deliberately scrape protected website without ASP or failed target, your account will be suspended.

You can try to target the desired website through our API player by creating a free account.
API Response contains header X-Scrapfly-Api-Cost indicate you the billed amount and X-Scrapfly-Remaining-Api-Credit indicate the remaining amount of Scrape API Credits

Some specific protected domain have a special price due to high protection, you can ask through support or test via player to show how much credit are billed

Integration

All related errors are listed below. You can see full description and example of error response on the Errors section.