Understand how timeout is working
You must configure your HTTP client read timeout to 155s
to avoid any issues. Scrapfly will manage the timeout according
the strategy used. If you specify a custom timeout value, add 5s to your read timeout - screenshot, debug, cache add some overhead
Timeout configuration allow you set a deadline when you start a scrape. In that way you can ensure the scrape will not take more time than the defined timeout, Scrapfly will stop and return an error.
Time management is crucial in web scraping in order to recover as fast as possible. Everything steps are budgeted and tracked
to prevent and recover from issue as fast as possible and provide the best reliability. Some scrapes are fast <5s
, some other can require more time when
rendering javascript ~25s
or even more when using complex user scenario ~90s
To be able to customize a timeout, retry must be disable retry=false
.
When Should I configure Timeout
If you are in the one of the following case:
- I want fast reply when it's going wrong to retry or manage it myself
- I scrape a slow target that sometimes fall in timeout
- I play a JS scenario which require more timeout
Always +5s
to your client read timeout when you customize the scrape timeout.
If you disable retry while using ASP, the default timeout is 30s. However, regarding some targeting that require are quite slow to pass, we recommend to increase the timeout to 60s as minimum. Below 60s, there is a high chance that on slow website or challenge, our system is not able to recover, rotate and bypass again. It will result by a blocked scrape on your end.
Usage Example
- Question: I want to run a javascript scenario that require 90s in the worst case
- Answer: Specify
retry=false&timeout=90000
, your http read timeout should be95s
- Question: I scrape a website without javascript and I want the lowest timeout as possible
- Answer: Set the minimum allowed (no asp, no js rendering)
15s
retry=false&timeout=15000
, your http read timeout should be20s
API Example
var request = require('request');
var options = {
'method': 'GET',
'url': 'https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.org%2Fanything&retry=false&tags=player%2Cproject%3Adefault&timeout=15000'
};
request(options, function (error, response) {
if (error) throw new Error(error);
console.log(response.body);
});
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.org%2Fanything&retry=false&tags=player%2Cproject%3Adefault&timeout=15000"
"api.scrapfly.io"
"/scrape"
key = ""
url = "https://httpbin.org/anything"
retry = "false"
tags = "player,project:default"
timeout = "15000"