Web Scraper API Specification
Scrapfly has loads of features and the best way to discover them is through the specification docs below. Alternatively, see the OpenAPI specification if you're familiar with openAPI structures.
If you have any questions you can check out the Frequently asked question section or see the support chat.
By default, the API has a read timeout of 155 seconds. To avoid read timeout errors, you must configure your HTTP client to set the read timeout to 155 seconds. If you need a different timeout value, please refer to the documentation for information on how to control the timeout.
Try out the API directly in your terminal using curl
:
Want to try out the API without coding? Check out our visual API player and test/generate code to use our API.
The default response format is JSON, and the scraped content is available in
result.content
. Your scrape configuration is present in
config
, and other activated feature information is available in
context
.
To get the HTML page directly, refer to the
proxified_response
parameter.
url=https://httpbin.dev/anything?q=I%20want%20to%20Scrape%20this
proxy_pool=public_datacenter_pool
proxy_pool=public_residential_pool
headers[content-type]=application%2Fjson
headers[Cookie]=test%3D1%3Bauth%3D1
-
country=us
-
country=us,ca,mx
-
country=us:1,ca:5,mx:3,-gb
-
country=-gb
Accept-Language
HTTP header.
If the website support the language, the content will be in that lang.
You can't set lang parameter and
Accept-Language
header
-
lang=en
-
lang=ch-FR,fr-FR,en
-
lang=en-IN,en-US
User-Agent
header
-
os=win11
-
os=mac
-
os=linux
-
os=chromeos
timeout
is
not trivial to understand regarding other settings -
a full documentation is available
-
timeout=30000
-
timeout=120000
-
retry=true
-
retry=false
result.content
. With
proxified_response
the content of the page is directly returned as body and status code / headers are replaced by the
target response.
-
proxified_response=true
-
proxified_response=false
debug=true
debug=false
correlation_id=e3ba784cde0d
tags[]=jewelery
tags[]=price
https://
.
You do not need to enable it for scraping
https://
target - it works by default, it just add more
information.
ssl=true
ssl=false
webhook_name=my-webhook-name
asp=true
asp=false
cost_budget=25
cost_budget=55
-
render_js=true
-
render_js=false
-
rendering_wait=5000
-
wait_for_selector=body
-
wait_for_selector=input[type="submit"]
-
wait_for_selector=//button[contains(text(),"Go")]
-
js=Y29uc29sZS5sb2coJ3Rlc3QnKQ
-
screenshots[page]=fullpage
screenshots[price]=#price
-
js_scenario=eydjbGljayc6IHsnc2VsZWN0b3InOiAnI3N1Ym1pdCd9fQ
latitude,longitude
-
geolocation=48.856614,2.3522219
-
geolocation=-74.005941,40.712784
-
auto_scroll=true
-
auto_scroll=false
complete
which is the default, or domcontentloaded
if you want a fast render
without waiting the full rendering (faster scrape)
-
rendering_stage=complete
rendering_stage=domcontentloaded
-
cache=true
-
cache=false
-
cache_ttl=60
-
cache_ttl=3600
-
cache_clear=true
-
cache_clear=false
-
session=17013313
-
session_sticky_proxy=true
-
session_sticky_proxy=false