Getting started
Basics
https://api.scrapfly.io
API Keys / Project
Your API Keys / Project are available from the top menu bar. You can also find this information in the overview page of your dashboard which display details of your project
Sign in or Sign Up to display information directly in the documentation
Open API
Scrapfly provide Open API specification in order to facilitate integration with standard. Everything is well documented and refer to more deeper doc.
You get benefits from the whole Open API open source eco system such as :
- Swagger Documentation | https://swagger.io/
- SDK Generator | https://openapi.tools/#sdk
- Mocking Server for testing | https://openapi.tools/#mock
https://scrapfly.io/docs/openapi
You can load Open API spec in any compatible viewer / player.
Postman
Postman is a collaborative platform for API development. Quickly test, play with Scrapfly API
https://scrapfly.io/api/postman
You can import Scrapfly Collection directly into your postman application.
First API Call
Interactive Example: Retrieve account information from API
curl -X GET https://api.scrapfly.io/scrape?key=__API_KEY__&url=http%3A%2F%2Fhttpbin.org%2Fanything&country=de&proxy_pool=public_pool
HTTP Call Pretty Print
Good to know
The http code attached to the response is reflecting scrapfly service
not upstream! The status_code
in the response body is reflecting the upstream service.
- You have to use the appropriate API key to refer to the right project / API key
- Only successful scrape are billed and counted in quota - API Response contain header
X-Scrapfly-Api-Cost
indicate you the amount of API call billed - Protocol used by the scraper is
http2
-
The scrapper retry 5 times on failure, for server error where http code is greater or equal than
500
and network issue. Automatic retries from system are not counted or billed in usage - Upstream connection timeout is 5s
- Max api read timeout is 2,5 minutes (150 seconds) - Anti Scraping Protection (ASP) can take up to 60s to bypass websites - Think to update the read timeout of your HTTP client, unless, you will get a timeout error whereas is still on process
- The scraper follow a maximum of 25 redirection
-
The HTTP code return by the response reflect the Scrapfly API service, not upstream response. All data about the upstream
are located in the response body (Same for headers, cookies etc). It mean if the target website respond with
500
and our system is up and running, we response with200
- We strongly recommend to enable "follow redirection" feature on your http client to prevent url redirection issue
-
We send you back the
X-Remaning-Scrape
header for the quota left. If you are at 0 and your plan allow and you have money provisioned to your account, you are performing extra request -
If the maximum of concurrent scrape or the throttling limit is reached, we provide you a
429
response aRetry-After
header to let you know the optimized time to retry (we respond in seconds. E.g :Retry-After: 30
) -
Link
header withrel=log
give you the URL to access to the log of your scrape -
Things that are automatically handled for you :
- All date returned or send to API are expressed in UTC
UTF-8
encode / decode- Compression and decompression
br, gzip, deflate
- Binary / Text format (If binary is returned, you will retrieve a
base64
) - Anti Scraping Protection is automatic and will adjust all settings for you (retrieve / activate session, cookies, headers, JS rendering, captcha solving)
Accept-Encoding
header is automatically added with"text/html"
by defaultUpgrade-Insecure-Requests
header is automatically added with1
by defaultCache-Control
header is automatically added withno-cache
by defaultUser-Agent
header is automatically addedContent-Encoding
of response is automatically managed- Cluster of real browsers
- Proxies
- ... well, just send the url to scrape, it works!