- By default the API respond in JSON, msgpack is also available. accept: application/json and accept: application/msgpack are supported
- Gzip compression is available when header content-encoding: gzip is set
- Text content is convert to utf-8, binary content is convert to base64
- By default everything is made to correctly scrape without being blocked . Pre configured user-agent and any other default headers
Quality of life
- Success rate and monitoring automatically tracked in your dashboard
- Mutli project/scraper management out of the box - simply use the API key from the project
- Replay scrape from log
- Experiment with our visual API playground
- Status page with notification subscription
Our API response include following useful headers:
- X-Scrapfly-Api-Cost API Cost billed
- X-Scrapfly-Remaining-Api-Credit Remaining Api Credit, if 0, billed in extra credit
- X-Scrapfly-Account-Concurrent-Usage You current concurrency usage of your account
- X-Scrapfly-Account-Remaining-Concurrent-Usage Maximum concurrency allowed by the account
- X-Scrapfly-Project-Concurrent-Usage Concurrency usage of the project
- X-Scrapfly-Project-Remaining-Concurrent-Usage If concurrency limit is set on the project otherwise equal to the account concurrency
If you directly want the total of API credit billed, you can check out the header
X-Scrapfly-Api-Cost. If you want to get the details,
you have the information in our JSON response
response.context.cost where you can find the detail and the total.
|Scenario||API Call Cost|
|Datacenter Proxies + Browser||1 + 5 = 6|
|Residential Proxies + Browser||25 + 5 = 30|
Protected website and ASP are billed following this grid - If ASP upgrade the network to residential or use a browser you will be billed as described above. Some very particular website have extra fees - it will appear in the cost details of the call.
Failed request >= 400 are not billed except the following:
400, 401, 404, 405, 406, 407, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 422, 424, 426, 428, 456.
To prevent any abuse, this is subject to our fair use policy, if more than 30% of the traffic with previous http code is reached - the fair use is disabled and you will pay failed
request. If your account fall under 60% of success rate and/or you deliberately scrape protected website without ASP or failed target, your account will be suspended.
Scrapfly uses conventional HTTP response codes to indicate the success or failure of an API request. In general: Codes in the 2xx range indicate success. Codes in the 4xx range indicate an error that failed given the information provided (e.g., a required parameter was omitted, not permitted, max concurrency reached, etc.). Codes in the 5xx range indicate an error with Scrapfly's servers.
HTTP 422 - Request Failed provide extra headers in order to help as much as possible:
- X-Scrapfly-Reject-Code: Error Code
- X-Scrapfly-Reject-Description: URL to the related documentation
- X-Scrapfly-Reject-Retryable: Indicate if the scrape is retryable
You can checkout the full error list to learn more.
HTTP Status Code Summary
|200 - OK||Everything worked as expected.|
|400 - Bad Request||The request was unacceptable, often due to missing a required parameter or a bad value or a bad format.|
|401 - Unauthorized||No valid API key provided.|
|402 - Payment Required||A payment issue occur and need to be resolved|
|403 - Forbidden||The API key doesn't have permissions to perform the request.|
|422 - Request Failed||The parameters were valid but the request failed.|
|429 - Too Many Requests||All free quota used or max allowed concurrency or domain throttled|
|500, 502, 503 - Server Errors||Something went wrong on Scrapfly's end.|
|504 - Timeout||The scrape have timeout|
Discover and learn the full potential of our API to scrape the desired targets.
If you have any questions you can check out the Frequently asked question section and ultimately ask on our chat.
The API read timeout is 155s by default. You must configure your http client to set the read timout to 155. If you don't want this value and want to avoid
Read timeouterror, you must set
curl -X GET https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X POST https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X PUT https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X PATCH https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -I https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
You will retrieve a JSON response, the scraped content is located in
You will find your scrape configuration in
response.config and various other information
response.context regarding the activated features.
Accept-LanguageHTTP header. If the website support the language, the content will be in that lang. You can't set lang parameter and
result.content. With proxified_response the content of the page is directly returned as body and status code / headers are replaced by the target response.
https://. You do not need to enable it for scraping
https://target - it works by default, it just add more information.