Getting Started

Overview

On steroid's

  • By default the API respond in JSON, msgpack is also available. accept: application/json and accept: application/msgpack are supported
  • Gzip compression is available when header content-encoding: gzip is set
  • Text content is convert to utf-8, binary content is convert to base64
  • By default everything is made to correctly scrape without being blocked . Pre configured user-agent and any other default headers

Quality of life

  • Success rate and monitoring automatically tracked in your dashboard
  • Mutli project/scraper management out of the box - simply use the API key from the project
  • Replay scrape from log
  • Experiment with our visual API playground
  • Status page with notification subscription
  • Our API response include following useful headers:
    • X-Scrapfly-Api-Cost API Cost billed
    • X-Scrapfly-Remaining-Api-Credit Remaining Api Credit, if 0, billed in extra credit
    • X-Scrapfly-Account-Concurrent-Usage You current concurrency usage of your account
    • X-Scrapfly-Account-Remaining-Concurrent-Usage Maximum concurrency allowed by the account
    • X-Scrapfly-Project-Concurrent-Usage Concurrency usage of the project
    • X-Scrapfly-Project-Remaining-Concurrent-Usage If concurrency limit is set on the project otherwise equal to the account concurrency
    Concurrency is defined by your subscription

Billing

If you directly want the total of API credit billed, you can check out the header X-Scrapfly-Api-Cost. If you want to get the details, you have the information in our JSON response response.context.cost where you can find the detail and the total.

Scenario API Call Cost
Datacenter Proxies 1
Datacenter Proxies + Browser 1 + 5 = 6
Residential Proxies 25
Residential Proxies + Browser 25 + 5 = 30

Protected website and ASP are billed following this grid - If ASP upgrade the network to residential or use a browser you will be billed as described above. Some very particular website have extra fees - it will appear in the cost details of the call.

Failed request >= 400 are not billed except the following: 400, 401, 404, 405, 406, 407, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 422, 424, 426, 428, 456. To prevent any abuse, this is subject to our fair use policy, if more than 30% of the traffic with previous http code is reached - the fair use is disabled and you will pay failed request. If your account fall under 60% of success rate and/or you deliberately scrape protected website without ASP or failed target, your account will be suspended.

Errors

Scrapfly uses conventional HTTP response codes to indicate the success or failure of an API request. In general: Codes in the 2xx range indicate success. Codes in the 4xx range indicate an error that failed given the information provided (e.g., a required parameter was omitted, not permitted, max concurrency reached, etc.). Codes in the 5xx range indicate an error with Scrapfly's servers.

HTTP 422 - Request Failed provide extra headers in order to help as much as possible:

  • X-Scrapfly-Reject-Code: Error Code
  • X-Scrapfly-Reject-Description: URL to the related documentation
  • X-Scrapfly-Reject-Retryable: Indicate if the scrape is retryable

You can checkout the full error list to learn more.

HTTP Status Code Summary

200 - OK Everything worked as expected.
400 - Bad Request The request was unacceptable, often due to missing a required parameter or a bad value or a bad format.
401 - Unauthorized No valid API key provided.
402 - Payment Required A payment issue occur and need to be resolved
403 - Forbidden The API key doesn't have permissions to perform the request.
422 - Request Failed The parameters were valid but the request failed.
429 - Too Many Requests All free quota used or max allowed concurrency or domain throttled
500, 502, 503 - Server Errors Something went wrong on Scrapfly's end.
504 - Timeout The scrape have timeout

Specification

Discover and learn the full potential of our API to scrape the desired targets.
If you have any questions you can check out the Frequently asked question section and ultimately ask on our chat.

The API read timeout is 155s by default. You must configure your http client to set the read timout to 155. If you don't want this value and want to avoid Read timeout error, you must set retry=false
curl -X GET https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X POST https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X PUT https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true
curl -X PATCH https://api.scrapfly.io/scrape?url=http://httpbin.org/anything?q=I%20want%20to%20Scrape%20this&country=us&render_js=true

You will retrieve a JSON response, the scraped content is located in response.result.content. You will find your scrape configuration in response.config and various other information in response.context regarding the activated features.

HTTP Parameter
Description
Example
url
required
Target url to scrape. Must be url encoded
url=https://httpbin.org/anything?q=I%20want%20to%20Scrape%20this
key
required
API Key to authenticate the call
key=16eae084cff64841be193a95fc8fa67dso
asp
popular
default: false
Anti Scraping Protection - Unblock protected website and bypass protection
asp=true asp=false
proxy_pool
popular
public_datacenter_pool
Select the proxy pool. A proxy pool is a network of proxy grouped by quality range and network type. The price vary base on the pool used.
proxy_pool=public_datacenter_pool proxy_pool=public_residential_pool
headers
popular
default: []
Pass custom headers to request. Must be url encoded
headers[content-type]=application%2Fjson
headers[Cookie]=test%3D1%3Bauth%3D1
debug
default: false
Store the API result and take a screenshot if the rendering js is enabled. A sharable link with the saved response is available. It's usefull to enable when you have an issue to communicate with our support.
debug=true
debug=false
correlation_id
default: null
Help to correlate a group of scrape issued by the same worker or machine. You can use it as filter in our monitoring dashboard
correlation_id=e3ba784cde0d
tags
default: []
Add tags to your calls to easily group or filter them with many values.
tags[]=jewelery
tags[]=price
dns
default: false
Query and retrieve target DNS information
dns=true
dns=false
ssl
default: false
Pull remote ssl certificate and return other tls information. Only available for https://. You do not need to enable it for scraping https:// target - it works by default, it just add more information.
ssl=true
ssl=false
webhook
default: null
Queue you scrape request and received the API response on your webhook endpoint. It takes the name of the webhook configured in your dashboard.
webhook=my-webhook-name
Javascript Rendering All related parameters require render_js enabled
Cache All related parameters require cache enabled
Session All related parameters require session enabled