Understanding Scrapfly Timeouts

For the best experience, make sure to configure the HTTP client used to reach Scrapfly with a minimum timeout of 155 seconds. If explicit Scrapfly timeout is used, add +5s overhead to your client read timeout.

Scrapfly timeouts configuration allows to set a deadline for each scrape request. If the scrape doesn't complete in the defined timeout, the scrape will be stopped, and a Scrapfly error response will be returned.

Note that Scrapfly scrape speeds depend on many factors. Starting with the use of optional features like Javascript rendering and Javascript scenarios to anti-bot bypass. Some simple scrapes can be completed in less than 5 seconds, while others can take more than 90 seconds if strict anti-scraper protection is encountered.

To be able to customize the scrape timeout, the retry feature must be disable (retry=false).

When Should I configure the Timeout?

Generally, it's best to trust Scrapfly to complete scrape requests within the default timeout budget however some cases warrant for a bigger timeout budget:

Scraping a slow or unresponsive website. Particularly when using javascript rendering with javascript heavy pages.
Javascript Scenario feature is used to execute complex browser actions.
ASAP scrape response is required for real-time web scraping systems.

To better understand how Scrapfly determines the timeout budget, take a look at the following diagram:

Always +5s overhead to your HTTP client read timeout when estimating the timeout budget.

Note that when using asp=true and retry=false the default timeout of 30 seconds might not be enough to bypass some anti-scraping protection systems. In that case, we recommend to increase the timeout to 60 seconds as minimum.

FAQ

Question: I want to run a javascript scenario that require 90s in the worst case
Answer: Specify retry=false&timeout=90000, your HTTP client read timeout should be at least 95s

Question: I scrape a website without javascript and I want the lowest timeout as possible
Answer: Set the minimum allowed (no asp, no js rendering) 15s retry=false&timeout=15000, your http read timeout should be at least 20s

Usage

To specify scrape timeout use retry=false and timeout=<milliseconds> query parameters. For example, for 20 second timeout use:

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?retry=false&timeout=20000&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fdelay%2F5")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body

https://api.scrapfly.io/scrape?retry=false&timeout=20000&key=&url=https%253A%252F%252Fhttpbin.dev%252Fdelay%252F5

Understanding Scrapfly Timeouts

When Should I configure the Timeout?

FAQ

Usage

Related Errors

Summary