Session

A session allows you to keep consistent navigation across the scrape. Sessions track visits, the referrer, and persistence. If JavaScript rendering is activated, it also persists and restores window.localStorage
and window.sessionStorage
. By default, sessions are connected to the same proxy.
A session is useful to keep consistent navigation behavior and stay under bot detection radar. Anti Scraping Protection auto activates and configures a session for you to have cookies that persist after the challenge.
When inspecting the log from the web interface, you can see all the session details such as navigation history, stored cookies, referrer, and all the metadata—creation date, last used date, expiration date.
Sharing Policy
Session sharing follows multiple rules to be consistent and avoid misconceptions.
The session is not shared across the project and environment, which are isolated from each other.
Sessions are shared even if you render JavaScript or not, so you can mix navigation and benefit from a better performance by avoiding JavaScript rendering when it's not necessary.
Your session name must be unique among your project / env or it will reuse existing session if exists
Eviction Policy
A session automatically expires after seven days. Each time session is reused expiration date is reset to seven days.
A session stick to a proxy by default - After 30s of inactivity of the session, the proxy ip is not guarantee - a new one will be assigned. Not all proxy pool can guarantee a long ip lifetime - for example residential ip have low lifetime and it's hard to predict how long you can hold the ip.
Limitation
Session feature cannot be used while Cache is activated
Usage
Simulate website setting cookies
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies%2Fset%3Fk1%3Dv1%26k2%3Dv2&tags=player%2Cproject%3Adefault&session=test1&asp=true"
response = requests.request("GET", url)
print(response.text)
# import json
# print(json.loads(response.text)['result']['content'])
# print(json.loads(response.text)['result']['status_code'])
Recall without setting cookies and display actual cookies
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies&tags=player%2Cproject%3Adefault&session=test1&asp=true"
response = requests.request("GET", url)
print(response.text)
# import json
# print(json.loads(response.text)['result']['content'])
# print(json.loads(response.text)['result']['status_code'])
Proxy is not stick to session, it means proxy rotate as usual while you scrape. Use it with caution. Most of the time, you should not be using it. Since most hash cookie/bot detection is based on IP or location, it might not give a good result.
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything&session=test&session_sticky_proxy=false"
response = requests.request("GET", url)
print(response.text)
# import json
# print(json.loads(response.text)['result']['content'])
# print(json.loads(response.text)['result']['status_code'])
Example Of Response
...
"context": {
...
"session": {
"name": "test",
"state": "FREE",
"lease": null,
"correlation_id": "default",
"identity": "929f7fb918367df788717d825a3d75391e148c76",
"created_at": "2020-09-15 12:44:15 UTC",
"cookie_jar": [
{
"comment": null,
"domain": "httpbin.dev",
"expires": null,
"http_only": false,
"max_age": null,
"name": "k1",
"path": "/cookies",
"secure": false,
"size": null,
"value": "v1",
"version": 0
},
{
"comment": null,
"domain": "httpbin.dev",
"expires": null,
"http_only": false,
"max_age": null,
"name": "k2",
"path": "/cookies",
"secure": false,
"size": null,
"value": "v2",
"version": 0
}
],
"last_used_at": "2020-09-15 12:44:50 UTC",
"expire_at": "2020-09-22 12:44:50 UTC",
"referer": "https://www.amazon.fr/"
}
...
}
...
Related Errors
All related errors are listed below. You can see full description and example of error response on Errors section
- ERR::SESSION::CONCURRENT_ACCESS - Concurrent access to the session has been tried. If your spider run on distributed architecture, check if the correlation id is correctly configured