# On steroid's
- By default the API respond in JSON, msgpack is also available. accept: application/json and accept: application/msgpack are supported
- Gzip compression is available when header content-encoding: gzip is set
- Text content is convert to utf-8, binary content is convert to base64
- By default everything is made to correctly scrape without being blocked . Pre-configured user-agent and any other default headers
# Quality of life
- Success rate and monitoring automatically tracked in your dashboard. Discover Monitoring
- Multi project/scraper management out of the box - simply use the API key from the project. Discover Project
- Replay scrape from log
- Experiment with our visual API playground
- Status page with notification subscription
Our API response include following useful headers:
- X-Scrapfly-Api-Cost API Cost billed
- X-Scrapfly-Remaining-Api-Credit Remaining Api Credit, if 0, billed in extra credit
- X-Scrapfly-Account-Concurrent-Usage You current concurrency usage of your account
- X-Scrapfly-Account-Remaining-Concurrent-Usage Maximum concurrency allowed by the account
- X-Scrapfly-Project-Concurrent-Usage Concurrency usage of the project
- X-Scrapfly-Project-Remaining-Concurrent-Usage If concurrency limit is set on the project otherwise equal to the account concurrency
If you directly want the total of API credit billed, you can check out the header
X-Scrapfly-Api-Cost. If you want to get the details,
you have the information in our JSON response
context.cost where you can find the detail and the total. You can check the format of the response
result.format that can be
TEXT (html, json, xml, txt, etc) or
BINARY (image, archive, pdf etc).
|Scenario||API Call Cost|
|Datacenter Proxies + Browser||1 + 5 = 6|
|Residential Proxies + Browser||25 + 5 = 30|
Protected website and ASP are billed following this grid - When ASP upgrade the network to residential or use a browser you will be billed as described above. Some very particular website have extra fees - it will appear in the cost details of the call.
Scrape Failed Protection and Fairness Policy
Scrape Failed Protection prevent failed scrape to be billed. In addition to prevent any abuse there is also a fairness policy
- Status code
>= 400and not excluded (see below) are eligible
- Excluded status code:
400, 401, 404, 405, 406, 407, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 422, 424, 426, 428, 456
To prevent any abuse:
- If more than 30% of the failed traffic with eligible status code (see above) in minimum of 1-hour period is detected - the fairness policy is disabled and usage billed
- If the account deliberately scrape protected website without success and without ASP, your account can be suspended by account manager decision
Scrapfly uses conventional HTTP response codes to indicate the success or failure of an API request.
Codes in the 2xx range indicate success.
Codes in the 4xx range indicate an error that failed given the information provided (e.g., a required parameter was omitted, not permitted, max concurrency reached, etc.).
Codes in the 5xx range indicate an error with Scrapfly's servers.
HTTP 422 - Request Failed provide extra headers in order to help as much as possible:
- X-Scrapfly-Reject-Code: Error Code
- X-Scrapfly-Reject-Description: URL to the related documentation
- X-Scrapfly-Reject-Retryable: Indicate if the scrape is retryable
HTTP Status Code Summary
|200 - OK||Everything worked as expected.|
|400 - Bad Request||The request was unacceptable, often due to missing a required parameter or a bad value or a bad format.|
|401 - Unauthorized||No valid API key provided.|
|402 - Payment Required||A payment issue occur and need to be resolved|
|403 - Forbidden||The API key doesn't have permissions to perform the request.|
|422 - Request Failed||The parameters were valid but the request failed.|
|429 - Too Many Requests||All free quota used or max allowed concurrency or domain throttled|
|500, 502, 503 - Server Errors||Something went wrong on Scrapfly's end.|
|504 - Timeout||The scrape have timeout|
|You can check out the full error list to learn more.|
Discover and learn the full potential of our API to scrape the desired targets.
If you have any questions you can check out the Frequently asked question section and ultimately ask on our chat.
The API read timeout is 155s by default. You must configure your http client to set the read timeout to 155s. If you don't want this value and want to avoid
Read timeouterror, you must check the documentation to control the timeout
curl -X GET https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X POST https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X PUT https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X PATCH https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -I https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
You will retrieve a JSON response by default, the scraped content is located in
You will find your scrape configuration in
response.config and various other information
response.context regarding the activated features.
Accept-LanguageHTTP header. If the website support the language, the content will be in that lang. You can't set lang parameter and
timeoutis not trivial to understand regarding other settings - a full documentation is available
result.content. With proxified_response the content of the page is directly returned as body and status code / headers are replaced by the target response.
https://. You do not need to enable it for scraping
https://target - it works by default, it just add more information.