Understanding Scrapfly Timeouts

For the best experience, make sure to configure the HTTP client used to reach Scrapfly with a minimum timeout of 155 seconds. If explicit Scrapfly timeout is used, add +5s overhead to your client read timeout.

Scrapfly timeouts configuration allows to set a deadline for each scrape request. If the scrape doesn't complete in the defined timeout, the scrape will be stopped, and a Scrapfly error response will be returned.

Note that Scrapfly scrape speeds depend on many factors. Starting with the use of optional features like Javascript rendering and Javascript scenarios to anti-bot bypass. Some simple scrapes can be completed in less than 5 seconds, while others can take more than 90 seconds if strict anti-scraper protection is encountered.

To be able to customize the scrape timeout, the retry feature must be disable (retry=false).

When Should I configure the Timeout?

Generally, it's best to trust Scrapfly to complete scrape requests within the default timeout budget however some cases warrant for a bigger timeout budget:

  • Scraping a slow or unresponsive website. Particularly when using javascript rendering with javascript heavy pages.
  • Javascript Scenario feature is used to execute complex browser actions.
  • ASAP scrape response is required for real-time web scraping systems.

To better understand how Scrapfly determines the timeout budget, take a look at the following diagram:

Always +5s overhead to your HTTP client read timeout when estimating the timeout budget.
Note that when using asp=true and retry=false the default timeout of 30 seconds might not be enough to bypass some anti-scraping protection systems. In that case, we recommend to increase the timeout to 60 seconds as minimum.

FAQ

  • Question: I want to run a javascript scenario that require 90s in the worst case
  • Answer: Specify retry=false&timeout=90000, your HTTP client read timeout should be at least 95s
  • Question: I scrape a website without javascript and I want the lowest timeout as possible
  • Answer: Set the minimum allowed (no asp, no js rendering) 15s retry=false&timeout=15000, your http read timeout should be at least 20s

Usage

To specify scrape timeout use retry=false and timeout=<milliseconds> query parameters. For example, for 20 second timeout use:

import requests

url = "https://api.scrapfly.io/scrape?retry=false&timeout=20000&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fdelay%2F5"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?retry=false&timeout=20000&key=&url=https%253A%252F%252Fhttpbin.dev%252Fdelay%252F5

Related Errors

Summary