Anti Scraping Protection - Unblock protected website

overview page of web interface
ASP tab of log inspection
Service disruption on this feature can occur, regardless of our will. Protection evolves; we need to find and adapt our solution and take days up to weeks to get a production-grade solution. As soon as you use this feature, you must keep it in mind and develop your software by considering it.

Anti Bot

Our Anti Scraping Protection is an internal solution to scrape protected website that won't let bots scraping bot. It's the addition of many concept to keep a coherent fingerprint and replicate real user fingerprint as close as possible when you scrape a website. If you are interest on how technically it works to be undetected, we have written a series of article

Despite we are able to bypass protection, many usage are prohibited* and any attempt will result in account suspension
  • Automated Online Payment
  • Account Creation
  • Spam Post
  • Falsify Vote
  • Credit Card Testing
  • Login Brute Force
  • Referral / Gifting systems
  • Ads fraud
  • Banks
* Cybersecurity firm can be allowed after we have authorized from the pen tested "victim" on the domain(s) they approved

Scrapfly detects and resolves challenges from well-known anti-scraping solutions on the market. Scrapfly also supports custom solutions on popular websites. Anti-bot bypass are transparent, no extra work on your side. You directly retrieve the successful response.

Keep in mind anti-bot solutions evolve, and we may need to adapt our bypass technics; this is why you should handle ASP errors correctly when relying on ASP.

If you plan to target a protected website, you should try many configurations, with or without browser, with residential proxies. If you send POST request with body, you must configure headers and content type to mimic a real call.

Despite all your attempts, if you are still blocked - you can contact us through the chat to investigate, custom bypass are available from custom plan with a minimum engagement of one year.

Usage

When ASP is enabled, anti-bot solution vendor are automatically detected and everything is managed to bypass it.

import requests

url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.org%2Fanything&tags=project%3Adefault&asp=true"

response = requests.request("GET", url)

print(response.text)

# import json
# print(json.loads(response.text)['result']['content'])
# print(json.loads(response.text)['result']['status_code'])
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.org%2Fanything&tags=project%3Adefault&asp=true"

key   = "" 
url   = "https://httpbin.org/anything" 
tags  = "project:default" 
asp   = "true" 

ASP is magic but have some limitations

Popular anti bot vendor are bypassed, but they are still some area that we can't cover in fully automated way and still require your attention to configure correctly the calls. Website exposing their internal api publicly or form POST will also protect it (authentication / proof of browser) and the ASP can't guess it / fix it. You need to investigate yourself and do if you do not push any effort in that direction while your scrape (POST/Private API) are rejected - you will stay unsuccessful. The most complicated part is already done for you (defeat anti bot), if you are in this case, here is the last mile:

POST form

Posting form require to pass correct headers and information like the website does. Most of the time headers Accept, Content-Type require special attention. The best way to replicate correctly headers is to inspect the call with the dev tools and trigger the call from your browser and inspect related resources.

Website Private API

Basically, same as for POST request. Most of the time private API requires to be authenticated - if basic cookies from a normal user navigation are not enough you might need to reverse the process to figured out how it's authenticated in order to retrieve the token and pass along your scrape to be authorized

Maximize Your Success Rate

Network Quality

Most of the time, datacenter IPs are good enough, but on websites protected by anti-bot vendors, they check the origin of IP if the traffic is coming from a datacenter or regular connection. By residential network, you will get an IP with a better reputation and registered under a regular ASN (which is used to control the origin of the IP).

Verify Cookies and Headers

Observe headers/cookies of regular calls that are successful; you can figure out if you need to add extra headers or retrieve specific cookies to authenticate. You can use the dev tool and inspect the network activity.

Navigation Coherence

You might need to retrieve cookies from navigation before calling unofficial API. The easiest way to achieve that is to scrape by enabling session and rendering JS to retrieve cookies and then you can scrape without rendering js; cookies are now stored in your session and applied back.

Geo Blocking

Some websites block navigation based on IP location; by default, Scrapfly select a random country from the pool, specify the country regarding the location of the website could help. Learn how to do it

Pricing

Pricing is not easy to predict with Anti Scraping Protection. Everything is designed and optimized to reduce the cost and maximize the reuse of authenticated sessions (which is free even if protection is activated). Be aware protected sites at scale have a real cost; the best way to budget your volume is to take an account and monitor the cost usage while increasing the volume to avoid surprises. We try to be transparent as much as we can on this subject because we know you need to predict and budget the price of the solution; if you have any questions or feedback, please contact us.

Pricing Grid

Scenario API Call Cost
ASP + Residential Proxies 25
ASP + Residential Proxies + Browser 25 + 5 = 30
ASP + Datacenter Proxies 1
ASP + Datacenter Proxies + Browser 6

Failed request >= 400 are not billed except the following: 400, 401, 404, 405, 406, 407, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 422, 424, 426, 428, 456. To prevent any abuse, this is subject to our fair use policy, if more than 30% of the traffic with previous http code is reached - the fair use is disabled and you will pay failed request. If your account fall under 60% of success rate and/or you deliberately scrape protected website without ASP or failed target, your account will be suspended.

Therefore, if you are not sure if a website is protected, you can enable ASP. If nothing is blocking, no extra calls are counted.

You can try to target the desired website through our API player by creating a free account.
API Response contains header X-Scrapfly-Api-Cost indicate you the billed amount and X-Scrapfly-Remaining-Api-Credit indicate the remaining amount of Scrape API Call

Some specific protected domain have a special price due to high protection, you can ask through support or test via player to show how much credit are billed

Integration

All related errors are listed below. You can see full description and example of error response on the Errors section.