Session
Scrapfly's session
feature allows for a consistent navigation profile to be kept across multiple scrape requests.
In other words, requests with the same session
key will share the same cookies,
referrer, and navigation history and proxy
(unless session_sticky_proxy is disabled).
Sessions can be shared between scrape requests with Javascript Rendering and without it. So, you can mix navigation and benefit from a better performance by avoiding JavaScript rendering when it's not necessary.
If JavaScript rendering is activated, it also persists and restores window.localStorage
and window.sessionStorage
.
Sessions use can increase the success rate of your scraping jobs when scraping anti-bot protected websites. When Anti Scraping Protection auto activates, it configures the session with all of the bypass details making subsequent, same-session requests more likely to succeed.
When inspecting the log from the web interface, you can see all the session details such as navigation history, stored cookies, referrer, and all the metadata—creation date, last used date, expiration date:
Sharing Policy
Sessions are not shared across Scrapfly projects and environments.
Sessions can be shared across scrape requests with Javascript Rendering and without it. So, you can use Javascript Rendering to establish state cookies and continue without it speeding up your scraping jobs significantly.
Note to name your sessions with a unique name in your Scrapfly project/environment. We recommend generating UUID4 to prevent accidental name overlap.
Eviction Policy
A session automatically expires after seven days. Each time a session is reused, the expiration date is reset to seven days.
A session sticks to a proxy by default (unless session_sticky_proxy is disabled). After 30 seconds of session inactivity, the proxy IP is not guaranteed and a new one might be assigned. This entirely depends on the used proxy pool as some proxy pools, like the residential ones, have a low lifetime.
Limitation
Session feature cannot be used while Cache is activated.
Usage
For example, this scrape requests simulate website setting cookies and we set our session as mysession1
import requests
url = "https://api.scrapfly.io/scrape?session=mysession1&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies%2Fset%3Fmycookie%3D123"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
Then, if we continue with more scrapfly requests with the same session the cookies will persist automatically:
import requests
url = "https://api.scrapfly.io/scrape?session=mysession1&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
This feature is not recommended as a natural way of browsing with sessions is using a sticky proxy for all related requests.
With session_sticky_proxy
disabled, the proxy IP will rotate for each request making the connection traffic
appear very unnatural. Use at your own risk!
import requests
url = "https://api.scrapfly.io/scrape?session=mysession1&session_sticky_proxy=false&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies%2Fset%3Fmycookie%3D123"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
Example Of Response
Related Errors
All related errors are listed below. You can see full description and example of error response on Errors section
- ERR::SESSION::CONCURRENT_ACCESS - Concurrent access to the session has been tried. If your spider run on distributed architecture, check if the correlation id is correctly configured