Session

View as markdown Copy for LLM

Scrapfly's session feature allows for a consistent navigation profile to be kept across multiple scrape requests. In other words, requests with the same session key will share the same cookies, referrer, and navigation history and proxy (unless session_sticky_proxy is disabled).

Sessions can be shared between scrape requests with Javascript Rendering and without it. So, you can mix navigation and benefit from a better performance by avoiding JavaScript rendering when it's not necessary. If JavaScript rendering is activated, it also persists and restores window.localStorage and window.sessionStorage data.

Session use can increase the success rate of your scraping jobs when scraping anti-bot protected websites. When Anti Scraping Protection auto activates, it configures the session with all the bypass details making subsequent, same-session requests more likely to succeed.

When inspecting the log from the web interface, you can see all the session details such as navigation history, stored cookies, referrer, and all the metadata—creation date, last used date, expiration date:

overview page of web interface for Scrapfly session information — session details on the monitoring logs page

Sharing Policy

Sessions are not shared across Scrapfly projects, environments or scrapfly team members.

Sessions can be shared across scrape requests with Javascript Rendering and without it. So, you can use Javascript Rendering to establish state cookies and continue without it speeding up your scraping jobs significantly.

Note to name your session with a unique name in your Scrapfly project/environment. We recommend generating UUID4 to prevent accidental name overlap.

Eviction Policy

A session automatically expires after seven days. Each time a session is reused, the expiration countdown is reset back to seven days.

A session sticks to a proxy by default (unless session_sticky_proxy is disabled). After 30 seconds of session inactivity, the proxy IP is not guaranteed and a new one might be assigned. This entirely depends on the used proxy pool as some proxy pools, like the residential ones, have a low lifetime.

Limitation

Session feature cannot be used while Cache is activated.

For example, this scrape requests simulate website setting cookies and we set our session as mysession1

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?session=mysession1&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies%2Fset%3Fmycookie%3D123")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body

Then, if we continue with more scrapfly requests with the same session the cookies will persist automatically:

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?session=mysession1&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body

This feature is not recommended as a natural way of browsing with sessions is using a sticky proxy for all related requests. With session_sticky_proxy disabled, the proxy IP will rotate for each request making the connection traffic appear very unnatural. Use at your own risk!

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?session=mysession1&session_sticky_proxy=false&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies%2Fset%3Fmycookie%3D123")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body

Example Of Response

...
"context": {
    ...
    "session": {
        "name": "mysession1",
        "state": "FREE",
        "lease": null,
        "correlation_id": "default",
        "identity": "929f7fb918367df788717d825a3d75391e148c76",
        "created_at": "2020-09-15 12:44:15 UTC",
        "cookie_jar": [
            {
                "comment": null,
                "domain": "httpbin.dev",
                "expires": null,
                "http_only": false,
                "max_age": null,
                "name": "mycookie",
                "path": "/cookies",
                "secure": false,
                "size": null,
                "value": "123",
                "version": 0
            },
        ],
        "last_used_at": "2023-09-15 12:44:50 UTC",
        "expire_at": "2023-09-22 12:44:50 UTC",
        "referer": "https://httpbin.dev/"
    }
    ...
}
...

Related Errors

All related errors are listed below. You can see full description and example of error response on Errors section

ERR::SESSION::CONCURRENT_ACCESS - Concurrent access to the session has been tried. If your spider run on distributed architecture, the same session name is currently used by another scrape

Integration

Session example with Python SDK