Session

overview page of web interface
Session tab of log inspection

A session allows you to keep consistent navigation across the scrape. Sessions track visits, the referrer, and persistence. If JavaScript rendering is activated, it also persists and restores window.localStorage and window.sessionStorage. By default, sessions are connected to the same proxy.

A session is useful to keep consistent navigation behavior and stay under bot detection radar. Anti Scraping Protection auto activates and configures a session for you to have cookies that persist after the challenge.

When inspecting the log from the web interface, you can see all the session details such as navigation history, stored cookies, referrer, and all the metadata—creation date, last used date, expiration date.

Sharing Policy

Session sharing follows multiple rules to be consistent and avoid misconceptions.

The session is not shared across the project and environment, which are isolated from each other.

Sessions are shared even if you render JavaScript or not, so you can mix navigation and benefit from a better performance by avoiding JavaScript rendering when it's not necessary.

Your session name must be unique among your project / env or it will reuse existing session if exists

Eviction Policy

A session automatically expires after seven days. Each time session is reused expiration date is reset to seven days.

Limitation

Session feature cannot be used while Cache is activated

Usage

curl -G \
--request "GET" \
--url "https://api.scrapfly.io/scrape" \
--data-urlencode "key=__API_KEY__" \
--data-urlencode "url=https://httpbin.org/anything" \
--data-urlencode "session=test"
Proxy is not stick to session, it means proxy rotate as usual while you scrape. Use it with caution. Most of the time, you should not be using it. Since most hash cookie/bot detection is based on IP or location, it might not give a good result.
curl -G \
--request "GET" \
--url "https://api.scrapfly.io/scrape" \
--data-urlencode "key=__API_KEY__" \
--data-urlencode "url=https://httpbin.org/anything" \
--data-urlencode "session=test" \
--data-urlencode "session_sticky_proxy=false"

Example Of Response

...
"context": {
    ...
    "session": {
        "name": "test",
        "state": "FREE",
        "lease": null,
        "correlation_id": "default",
        "identity": "929f7fb918367df788717d825a3d75391e148c76",
        "created_at": "2020-09-15 12:44:15 UTC",
        "cookie_jar": [
            {
                "name": "session-id",
                "value": "260-1721863-1512555",
                "expires": "2021-09-15 12:44:49 UTC",
                "path": "/",
                "comment": null,
                "domain": ".amazon.fr",
                "max_age": null,
                "secure": false,
                "http_only": false,
                "version": null,
                "size": 29
            },
            {
                "name": "i18n-prefs",
                "value": "EUR",
                "expires": "2021-09-15 12:44:49 UTC",
                "path": "/",
                "comment": null,
                "domain": ".amazon.fr",
                "max_age": null,
                "secure": false,
                "http_only": false,
                "version": null,
                "size": 13
            },
            {
                "name": "csm-hit",
                "value": "tb:s-QZYPV6S7SE33B4BZ9C4Q|1600173886804&t:1600173887212&adb:adblk_no",
                "expires": "2021-08-31 12:44:47 UTC",
                "path": "/",
                "comment": null,
                "domain": "www.amazon.fr",
                "max_age": null,
                "secure": false,
                "http_only": false,
                "version": null,
                "size": 75
            },
            {
                "name": "ubid-acbfr",
                "value": "257-3489518-3158810",
                "expires": "2021-09-15 12:44:49 UTC",
                "path": "/",
                "comment": null,
                "domain": ".amazon.fr",
                "max_age": null,
                "secure": false,
                "http_only": false,
                "version": null,
                "size": 29
            },
            {
                "name": "session-id-time",
                "value": "2082787201l",
                "expires": "2021-09-15 12:44:49 UTC",
                "path": "/",
                "comment": null,
                "domain": ".amazon.fr",
                "max_age": null,
                "secure": false,
                "http_only": false,
                "version": null,
                "size": 26
            },
            {
                "name": "session-token",
                "value": "\"Eb83GFBgcWB9a5j+7gEBHw8+vA9lxXwdxRWR6DYBWuX4pmLsPfwacoVoW2ChuDf6NiMHiPqZdxYfz9xb9VtLnTxYj7jZczNrNWbGi5PforjBW0TuWJ+iUsBl8k/C7NmcmPribBoO2CeP4AcYn5BTVPfbdfHHFfMKaSFLTT7egDSyITA2EyCWMn/rTvl/DgT0GRK+AI5DWRgCcZkEwjPGqMt9EFsE8lrOvOO9cs9gCLgqc5NnCGnTH4+WuOb4tERStnOSPN/Bdn0=\"",
                "expires": "2021-09-15 12:44:49 UTC",
                "path": "/",
                "comment": null,
                "domain": ".amazon.fr",
                "max_age": null,
                "secure": false,
                "http_only": false,
                "version": null,
                "size": 283
            },
            {
                "name": "ad-privacy",
                "value": "0",
                "expires": "2025-10-01 12:44:49 UTC",
                "path": "/",
                "comment": null,
                "domain": ".amazon-adsystem.com",
                "max_age": null,
                "secure": true,
                "http_only": true,
                "version": null,
                "size": 11
            }
        ],
        "last_used_at": "2020-09-15 12:44:50 UTC",
        "expire_at": "2020-09-22 12:44:50 UTC",
        "referer": "https://www.amazon.fr/"
    }
    ...
}
...

Advanced Usage

Schema of distributed session explained

By default session attribution is managed by our system. You can still managed it by yourself to optimized it.

In the actual world application, you will want to parallelize scrape calls that are not compatible with session behavior. If each worker/process/thread is insulated from each other (behavior of one does not interfere with the other), the distributed session will help you.

To use the distributed session, you simply have to specify the correlation_id with the current worker's unique identity kit. It could be a hostname in the case of worker, pid or multiprocessing use thread_id for a threaded application.

Now sessions are insulated by the processor in its own world, and other processors that also use the session do not affect yours and keep coherent navigation flow/referrer.

By default, when sessions are used, proxies are attached to them. When a distributed session is created, a soft anti-affinity strategy is applied on proxy attachment. It means the proxy allocated to your session is not used by another session in your distributed pool. However, if no proxy fulfills the attached requirement, a proxy already used in another could be attached. That's why it's a "soft" anti-affinity.

All related errors are listed below. You can see full description and example of error response on Errors section

Integration