Session

Scrapfly's session feature allows for a consistent navigation profile to be kept across multiple scrape requests. In other words, requests with the same session key will share the same cookies, referrer, and navigation history and proxy (unless session_sticky_proxy is disabled).

Sessions can be shared between scrape requests with Javascript Rendering and without it. So, you can mix navigation and benefit from a better performance by avoiding JavaScript rendering when it's not necessary. If JavaScript rendering is activated, it also persists and restores window.localStorage and window.sessionStorage.

Sessions use can increase the success rate of your scraping jobs when scraping anti-bot protected websites. When Anti Scraping Protection auto activates, it configures the session with all of the bypass details making subsequent, same-session requests more likely to succeed.

When inspecting the log from the web interface, you can see all the session details such as navigation history, stored cookies, referrer, and all the metadata—creation date, last used date, expiration date:

session details on the monitoring logs page

Sharing Policy

Sessions are not shared across Scrapfly projects and environments.

Sessions can be shared across scrape requests with Javascript Rendering and without it. So, you can use Javascript Rendering to establish state cookies and continue without it speeding up your scraping jobs significantly.

Note to name your sessions with a unique name in your Scrapfly project/environment. We recommend generating UUID4 to prevent accidental name overlap.

Eviction Policy

A session automatically expires after seven days. Each time a session is reused, the expiration date is reset to seven days.

A session sticks to a proxy by default (unless session_sticky_proxy is disabled). After 30 seconds of session inactivity, the proxy IP is not guaranteed and a new one might be assigned. This entirely depends on the used proxy pool as some proxy pools, like the residential ones, have a low lifetime.

Limitation

Session feature cannot be used while Cache is activated.

Usage

For example, this scrape requests simulate website setting cookies and we set our session as mysession1

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?session=mysession1&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies%2Fset%3Fmycookie%3D123")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body

Then, if we continue with more scrapfly requests with the same session the cookies will persist automatically:

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?session=mysession1&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body
This feature is not recommended as a natural way of browsing with sessions is using a sticky proxy for all related requests. With session_sticky_proxy disabled, the proxy IP will rotate for each request making the connection traffic appear very unnatural. Use at your own risk!
require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?session=mysession1&session_sticky_proxy=false&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies%2Fset%3Fmycookie%3D123")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body

Example Of Response

All related errors are listed below. You can see full description and example of error response on Errors section

  • ERR::SESSION::CONCURRENT_ACCESS - Concurrent access to the session has been tried. If your spider run on distributed architecture, check if the correlation id is correctly configured

Integration

Summary