Getting started

Basics

openapi openapi API Endpoint | Http
            https://api.scrapfly.io
        

API Keys / Project

Your API Keys/Project are available from the top menu bar. You can also find this information on the overview page of your dashboard, which displays details of your project:

Sign in or Sign up to display information directly in the documentation

Open API

Scrapfly provides Open API specifications to facilitate integration with the standard. Everything is well documented, and refer to the more in-depth doc.

You get benefits from the whole Open API open source eco system such as:

openapi openapi Open API Specification URL | Http
            https://scrapfly.io/docs/openapi
        

You can load Open API spec in any compatible viewer/player.

Postman

Postman is a collaborative platform for API development. Quickly test, play with Scrapfly API

openapi openapi Scrapfly Postman Definition | Http
            https://scrapfly.io/api/postman
        

You can import the Scrapfly Collection directly into your postman application.

First API Call

Interactive Example: Retrieve account information from API
openapi openapi Retrieve account information from API Sign in
            curl -X GET https://api.scrapfly.io/scrape?key=__API_KEY__&url=http%3A%2F%2Fhttpbin.org%2Fanything&country=de&proxy_pool=public_pool
        
HTTP Call Pretty Print
https://api.scrapfly.io/scrape?key=&url=http%3A%2F%2Fhttpbin.org%2Fanything&country=de&proxy_pool=public_pool

key
=
url
= http%3A%2F%2Fhttpbin.org%2Fanything
country
= de
proxy_pool
= public_pool

Good to know

The HTTP code attached to the response is reflecting Scrapfly service, not upstream! The status_code in the response body is reflecting the upstream service.
  • You have to use the appropriate API key to refer to the correct project/API key
  • Only a successful scrape is billed and counted in the quota. The API Response contains the header X-Scrapfly-Api-Cost to indicate to you the amount of the API call billed
  • Protocol used by the scraper is http2
  • The scraper retries five times on server error failure where the HTTP code is greater or equal to 500 and network issue. Automatic retries from the system are not counted or billed in usage.
  • Upstream connection timeout is 5 seconds
  • Max API read timeout is 2.5 minutes (150 seconds) - Anti Scraping Protection (ASP) can take up to 60seconds to bypass websites. Consider updating the read timeout of your HTTP client unless you get a timeout error while it's still in the process
  • The scraper follows a maximum of a 25-second redirection
  • The HTTP code return by the response reflects the Scrapfly API service, not the upstream response. All data about the upstream are located in the response body (same for headers, cookies, and the like). It means if the target website responds with 500 and our system is up and running, we respond with 200
  • We strongly recommend enabling the "follow redirection" feature on your HTTP client to prevent URL redirection issue
  • We send you back the X-Remaining-Scrape header for the leftover quota. You are performing an additional request if you have an account credit; you are at 0 and if your plan allows it.
  • If the maximum of concurrent scrape or the throttling limit is reached, we provide you a 429 response a Retry-After header to let you know the optimized time to retry (we respond in seconds. e.g., Retry-After: 30)
  • Link header with rel=log give you the URL to access the log of your scrape
  • Things that are automatically handled for you:
    • All date returned or sent to API are expressed in UTC
    • UTF-8 encode / decode
    • Compression and decompression br, gzip, deflate
    • Binary / Text format (If the binary is returned, you will retrieve a base64)
    • Anti-Scraping Protection is automatic and will adjust all settings for you (retrieve/activate session, cookies, headers, JS rendering, captcha solving)
    • Accept-Encoding header is automatically added with "text/html" by default
    • Upgrade-Insecure-Requests header is automatically added with 1 by default
    • Cache-Control header is automatically added with no-cache by default
    • User-Agent header is automatically added
    • Content-Encoding of response is automatically managed
    • Cluster of real browsers
    • Proxies
    • ... well, just send the URL to scrape; it works!

Integration