Understanding Scrapfly Timeouts
For the best experience, make sure to configure the HTTP client used to reach Scrapfly with a minimum timeout of
155seconds. If explicit Scrapfly timeout is used, add
+5soverhead to your client read timeout.
Scrapfly timeouts configuration allows to set a deadline for each scrape request. If the scrape doesn't complete in the defined timeout, the scrape will be stopped, and a Scrapfly error response will be returned.
To be able to customize the scrape timeout, the retry feature must be disable (
When Should I configure the Timeout?
Generally, it's best to trust Scrapfly to complete scrape requests within the default timeout budget however some cases warrant for a bigger timeout budget:
- ASAP scrape response is required for real-time web scraping systems.
To better understand how Scrapfly determines the timeout budget, take a look at the following diagram:
+5soverhead to your HTTP client read timeout when estimating the timeout budget.
Note that when using
retry=falsethe default timeout of 30 seconds might not be enough to bypass some anti-scraping protection systems. In that case, we recommend to increase the timeout to 60 seconds as minimum.
- Answer: Specify
retry=false&timeout=90000, your HTTP client read timeout should be at least
- Answer: Set the minimum allowed (no asp, no js rendering)
retry=false&timeout=15000, your http read timeout should be at least
To specify scrape timeout use
timeout=<milliseconds> query parameters. For example, for 20 second timeout use:
curl -G \ --request "GET" \ --url "https://api.scrapfly.io/scrape" \ --data-urlencode "retry=false" \ --data-urlencode "timeout=20000" \ --data-urlencode "key=__API_KEY__" \ --data-urlencode "url=https://httpbin.dev/delay/5"
"https://api.scrapfly.io/scrape?retry=false&timeout=20000&key=&url=https%3A%2F%2Fhttpbin.dev%2Fdelay%2F5" "api.scrapfly.io" "/scrape" retry = "false" timeout = "20000" key = "" url = "https://httpbin.dev/delay/5"