Understand how timeout is working
You must configure your HTTP client read timout to
155sto avoid any issues. Scrapfly will manage the timeout according the strategy used. If you specify a custom timeout value, add 5s to your read timeout - screenshot, debug, cache add some overhead
Timeout configuration allow you set a deadline when you start a scrape. In that way you can ensure the scrape will not take more time than the defined timeout, Scrapfly will stop and return an error.
Time management is crucial in web scraping in order to recover as fast as possible. Everything steps are budgeted and tracked
to prevent and recover from issue as fast as possible and provide the best reliability. Some scrapes are fast
<5s, some other can require more time when
~25s or even more when using complex user scenario
To be able to customize a timeout, retry must be disable
When Should I configure Timeout
If you are in the one of the following case:
- I want fast reply when it's going wrong to retry or manage it myself
- I scrape a slow target that sometimes fall in timeout
- I play a JS scenario which require more timeout
+5sto your client read timeout when you customize the scrape timeout.
If you disable retry while using ASP, the default timeout is 30s. However, regarding some targeting that require are quite slow to pass, we recommend to increase the timeout to 60s as minimum. Below 60s, there is a high chance that on slow website or challenge, our system is not able to recover, rotate and bypass again. It will result by a blocked scrape on your end.
- Answer: Specify
retry=false&timeout=90000, your http read timeout should be
- Answer: Set the minimum allowed (no asp, no js rendering)
retry=false&timeout=15000, your http read timeout should be
require "uri" require "net/http" url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.org%2Fanything&retry=false&tags=player%2Cproject%3Adefault&timeout=15000") https = Net::HTTP.new(url.host, url.port); https.use_ssl = true request = Net::HTTP::Get.new(url) response = https.request(request) puts response.read_body
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.org%2Fanything&retry=false&tags=player%2Cproject%3Adefault&timeout=15000" "api.scrapfly.io" "/scrape" key = "" url = "https://httpbin.org/anything" retry = "false" tags = "player,project:default" timeout = "15000"