Understanding Scrapfly Timeouts
For the best experience, make sure to configure the HTTP client used to reach Scrapfly with a minimum timeout of155
seconds. If explicit Scrapfly timeout is used, add+5s
overhead to your client read timeout.
Scrapfly timeouts configuration allows to set a deadline for each scrape request. If the scrape doesn't complete in the defined timeout, the scrape will be stopped, and a Scrapfly error response will be returned.
Note that Scrapfly scrape speeds depend on many factors. Starting with the use of optional features like Javascript rendering and Javascript scenarios to anti-bot bypass. Some simple scrapes can be completed in less than 5 seconds, while others can take more than 90 seconds if strict anti-scraper protection is encountered.
To be able to customize the scrape timeout, the retry feature must be disable (retry=false
).
When Should I configure the Timeout?
Generally, it's best to trust Scrapfly to complete scrape requests within the default timeout budget however some cases warrant for a bigger timeout budget:
- Scraping a slow or unresponsive website. Particularly when using javascript rendering with javascript heavy pages.
- Javascript Scenario feature is used to execute complex browser actions.
- ASAP scrape response is required for real-time web scraping systems.
To better understand how Scrapfly determines the timeout budget, take a look at the following diagram:
Always +5s
overhead to your HTTP client read timeout when estimating the timeout budget.
Note that when usingasp=true
andretry=false
the default timeout of 30 seconds might not be enough to bypass some anti-scraping protection systems. In that case, we recommend to increase the timeout to 60 seconds as minimum.
FAQ
- Question: I want to run a javascript scenario that require 90s in the worst case
- Answer: Specify
retry=false&timeout=90000
, your HTTP client read timeout should be at least95s
- Question: I scrape a website without javascript and I want the lowest timeout as possible
- Answer: Set the minimum allowed (no asp, no js rendering)
15s
retry=false&timeout=15000
, your http read timeout should be at least20s
Usage
To specify scrape timeout use retry=false
and timeout=<milliseconds>
query parameters. For example, for 20 second timeout use:
require "uri"
require "net/http"
url = URI("https://api.scrapfly.io/scrape?retry=false&timeout=20000&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fdelay%2F5")
https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true
request = Net::HTTP::Get.new(url)
response = https.request(request)
puts response.read_body
https://api.scrapfly.io/scrape?retry=false&timeout=20000&key=&url=https%253A%252F%252Fhttpbin.dev%252Fdelay%252F5