Session

overview page of web interface
Session tab of log inspection

A session allows you to keep consistent navigation across the scrape. Sessions track visits, the referrer, and persistence. If JavaScript rendering is activated, it also persists and restores window.localStorage and window.sessionStorage. By default, sessions are connected to the same proxy.

A session is useful to keep consistent navigation behavior and stay under bot detection radar. Anti Scraping Protection auto activates and configures a session for you to have cookies that persist after the challenge.

When inspecting the log from the web interface, you can see all the session details such as navigation history, stored cookies, referrer, and all the metadata—creation date, last used date, expiration date.

Sharing Policy

Session sharing follows multiple rules to be consistent and avoid misconceptions.

The session is not shared across the project and environment, which are isolated from each other.

Sessions are shared even if you render JavaScript or not, so you can mix navigation and benefit from a better performance by avoiding JavaScript rendering when it's not necessary.

Your session name must be unique among your project / env or it will reuse existing session if exists

Eviction Policy

A session automatically expires after seven days. Each time session is reused expiration date is reset to seven days.

A session stick to a proxy by default - After 30s of inactivity of the session, the proxy ip is not guarantee - a new one will be assigned. Not all proxy pool can guarantee a long ip lifetime - for example residential ip have low lifetime and it's hard to predict how long you can hold the ip.

Limitation

Session feature cannot be used while Cache is activated

Usage

Simulate website setting cookies

curl -G \
--request "GET" \
--url "https://api.scrapfly.io/scrape" \
--data-urlencode "key=__API_KEY__" \
--data-urlencode "url=https://httpbin.dev/cookies/set?k1=v1&k2=v2" \
--data-urlencode "tags=player,project:default" \
--data-urlencode "session=test1" \
--data-urlencode "asp=true"

Recall without setting cookies and display actual cookies

curl -G \
--request "GET" \
--url "https://api.scrapfly.io/scrape" \
--data-urlencode "key=__API_KEY__" \
--data-urlencode "url=https://httpbin.dev/cookies" \
--data-urlencode "tags=player,project:default" \
--data-urlencode "session=test1" \
--data-urlencode "asp=true"
Proxy is not stick to session, it means proxy rotate as usual while you scrape. Use it with caution. Most of the time, you should not be using it. Since most hash cookie/bot detection is based on IP or location, it might not give a good result.
curl -G \
--request "GET" \
--url "https://api.scrapfly.io/scrape" \
--data-urlencode "key=__API_KEY__" \
--data-urlencode "url=https://httpbin.dev/anything" \
--data-urlencode "session=test" \
--data-urlencode "session_sticky_proxy=false"

Example Of Response

...
"context": {
    ...
    "session": {
        "name": "test",
        "state": "FREE",
        "lease": null,
        "correlation_id": "default",
        "identity": "929f7fb918367df788717d825a3d75391e148c76",
        "created_at": "2020-09-15 12:44:15 UTC",
        "cookie_jar": [
            {
                "comment": null,
                "domain": "httpbin.dev",
                "expires": null,
                "http_only": false,
                "max_age": null,
                "name": "k1",
                "path": "/cookies",
                "secure": false,
                "size": null,
                "value": "v1",
                "version": 0
            },
            {
                "comment": null,
                "domain": "httpbin.dev",
                "expires": null,
                "http_only": false,
                "max_age": null,
                "name": "k2",
                "path": "/cookies",
                "secure": false,
                "size": null,
                "value": "v2",
                "version": 0
            }
        ],
        "last_used_at": "2020-09-15 12:44:50 UTC",
        "expire_at": "2020-09-22 12:44:50 UTC",
        "referer": "https://www.amazon.fr/"
    }
    ...
}
...

All related errors are listed below. You can see full description and example of error response on Errors section

  • ERR::SESSION::CONCURRENT_ACCESS - Concurrent access to the session has been tried. If your spider run on distributed architecture, check if the correlation id is correctly configured

Integration