Javascript Scenario

Scrapfly's js_scenario provides an ability to fully control a headless web browser. Javascript Scenario can be used to issue browser commands like clicking buttons, filling in forms, scrolling and executing custom javascript code. Currently these actions are supported:

click fill condition wait scroll execute

This feature requires Javascript Rendering enabled and the target page must be of HTML type.

Javascript scenario details are available in the API response result.browser_data.js_scenario as well as the monitoring dashboard:

overview page of web interface for Scrapfly Javascript Scenarios — javascript scenario view on monitoring dashboard

Usage

Javascript scenario consists of one or multiple browser actions that are passed to Scrapfly as a base64 encoded JSON array. An average scenario looks something like this:

[
    {"fill": {"selector": "#username", "value":"demo"}},
    {"fill": {"selector": "#password", "value":"demo"}},
    {"click": {"selector": "form input[type='submit']"}},
    {"wait_for_navigation": {"timeout": 5000}}
]

Each scenario step is a JSON object with a single key that represents the action to be performed and details of the action.

Once you design your javascript scenario use Scrapfly's base64 encoding online tool to convert it to a base64 encoded string that can be passed to the API for execution.

Note on Timeouts

The entire Javascript Scenario has an execution budget of 25 seconds. Scrapfly does a rough estimation on the maximum JS scenario execution time and will reject any scenarios that are estimated to take more than 25 seconds.

For long-running javascript scenario requiring more than 25s - You can check our guide on how timeout works
TL;DR retry=false timeout after 90s by default and you can customize the timeout with retry=false&timeout=120000

Full example with API Player

The best way to get familiar with Javascript Scenarios is to use the Scrapfly Web Player to design and test your scenario. However, here's an example to get you started. The below scenario will login to web-scraping.dev/login by performing these steps:

Select username input box and fill value user123
Select password input box and fill value password
Select and click login button
Wait for navigation to acknowledge button click for maximum of 5 seconds

[
    {"fill": {"selector": "input[name=username]", "value":"user123"}},
    {"fill": {"selector": "input[name=password]", "value":"password"}},
    {"click": {"selector": "button[type='submit']"}},
    {"wait_for_navigation": {"timeout": 5000}}
]

Then, this scenario can be base64 encoded and passed to Scrapfly API for execution:

import requests

url = "https://api.scrapfly.io/scrape?render_js=true&js_scenario=W3siZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9dXNlcm5hbWVdIiwidmFsdWUiOiJ1c2VyMTIzIn19LHsiZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9cGFzc3dvcmRdIiwidmFsdWUiOiJwYXNzd29yZCJ9fSx7ImNsaWNrIjp7InNlbGVjdG9yIjoiYnV0dG9uW3R5cGU9J3N1Ym1pdCddIn19LHsid2FpdF9mb3JfbmF2aWdhdGlvbiI6eyJ0aW1lb3V0Ijo1MDAwfX1d&key=__API_KEY__&url=https%3A%2F%2Fweb-scraping.dev%2Flogin"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])

https://api.scrapfly.io/scrape?render_js=true&js_scenario=W3siZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9dXNlcm5hbWVdIiwidmFsdWUiOiJ1c2VyMTIzIn19LHsiZmlsbCI6eyJzZWxlY3RvciI6ImlucHV0W25hbWU9cGFzc3dvcmRdIiwidmFsdWUiOiJwYXNzd29yZCJ9fSx7ImNsaWNrIjp7InNlbGVjdG9yIjoiYnV0dG9uW3R5cGU9J3N1Ym1pdCddIn19LHsid2FpdF9mb3JfbmF2aWdhdGlvbiI6eyJ0aW1lb3V0Ijo1MDAwfX1d&key=&url=https%253A%252F%252Fweb-scraping.dev%252Flogin

Example of response with scenario

...
"result": {
    ...,
    "browser_data": {
        "xhr_call": [...],
        "local_storage_data": {
            "csm-hit": "tb:s-5B0K136YR4QK89MQ8RG0|1596420691120&t:1596420692684&adb:adblk_no",
            "csm:adb": "adblk_no",
            "csm-bf": "[\"5B0K136YR4QK89MQ8RG0\"]",
            "a-font-class": "a-ember"
        },
        "session_storage_data": {
            "csm-hit": "tb:s-5B0K136YR4QK89MQ8RG0|1596420691120&t:1596420692684&adb:adblk_no",
            "csm:adb": "adblk_no",
            "csm-bf": "[\"5B0K136YR4QK89MQ8RG0\"]",
            "a-font-class": "a-ember"
        },
        "websockets": [],
        "javascript_evaluation_result": null,
        "js_scenario": {
            "duration": 4.92,
            "executed": 4,
            "steps": [
                {
                    "action": "fill",
                    "config": {
                        "selector": "input[name=username]",
                        "value": "user123"
                    },
                    "duration": 1.11,
                    "executed": true,
                    "result": null,
                    "success": true
                },
                {
                    "action": "fill",
                    "config": {
                        "selector": "input[name=password]",
                        "value": "password"
                    },
                    "duration": 0.47,
                    "executed": true,
                    "result": null,
                    "success": true
                },
                {
                    "action": "click",
                    "config": {
                        "ignore_if_not_visible": false,
                        "selector": "button[type='submit']"
                    },
                    "duration": 0.52,
                    "executed": true,
                    "result": null,
                    "success": true
                },
                {
                    "action": "wait_for_navigation",
                    "config": {
                        "expect_url": null,
                        "timeout": 5000
                    },
                    "duration": 1.81,
                    "executed": true,
                    "result": null,
                    "success": true
                }
            ]
        },
    },
    ...
}
...

Scenario Step Types

Currently, 7 scenario types are supported. Each scenario type has a different set of mandatory and optional parameters.

[MANDATORY] param_name:type
[OPTIONAL] param_name:type

Click

selector:string

ignore_if_not_visible:bool=false

timeout budget (ms): +2500

Click on a visible element. It's a native click which emits a trusted event - it's not simulated using javascript.

Internal Workflow

Waits for the element to be visible
Moves the viewport to the element (mouse and scroll as a human would)
Focuses the element
Left clicks

Parameters

selector:string Accept CSS Selector and XPATH Selector
ignore_if_not_visible:bool Wait the element if visible or skip if not
multiple:bool If multiple elements match, click on all matched elements

Usage

{"click": {"selector": ".cookie-gdpr-consent", "ignore_if_not_visible": true)}}

{"click": {"selector": "submit.btn"}}

Fill

selector:string

value:string

timeout budget (ms): +${timeout} +500

Type a text value in the targeted element. The typing is not simulated using javascript - it's from a real keyboard input.

Internal Workflow

Waits for the element to be visible
Moves the viewport to the element (mouse and scroll as a human would)
Focuses the element
Types the text value into the input as a human would

Parameters

selector:string Any valid CSS and XPATH Selector
value:string Value to type in element
clear:boolean Clear the input field before writing

Usage

{"fill": {"selector": "#name", "value": "John Do")}}

Condition

Condition are exclusive to one of

status_code:int
selector:string
- selector_state:string=existing
- timeout:int=1000

action:string=continue

Parameters

selector:string Any valid CSS or XPATH selector
selector_state:string Can be existing or not_existing
action:string Action when the condition is met, can be continue, exit_success, exit_failed

Play the scenario only if the condition is met. For example, it can stop processing scenario if non-200 status code is encountered.

Internal Workflow

Checks whether response status code matches the required status code codintions

Usage

{"condition": {"status_code": 200}}

Wait

timeout budget (ms): +${wait}

Pause during the scenario to give browser some time to load. Note that the pause time is part of the scenario budget

Parameters

There are no parameters - pass the wait time in milliseconds directly

Usage

{"wait": 2000}

Scroll

element:string=body

selector:string=bottom

timeout budget (ms): +500

Scroll to the selector (or the bottom of the page if no selector provided). If the element parameter is a valid selector, the scrolling wil be executed within the selected element. The scroll is not simulated using javascript - it's created with a real mouse input.

Internal Workflow

Waits for the selector to be visible (if set)
Waits for the element to be visible (if set) and binds scroll to the element
Scrolls the page as a human would

Parameters

element:string=body a valid css selector or xpath or "body"
selector:string a valid css selector or xpath or "bottom" as target to scroll
infinite:int=0 infinite scroll - number of scroll iteration
click_selector:string a valid css selector or xpath to click on after the scroll - like a "view more" button

Usage

{"scroll": {"selector": "bottom"}}

{"scroll": {"selector": "#pricing"}}

{"scroll": {"element": "#scrollable-list", "selector": "bottom", "infinite": 2}}

Execute

timeout:int=3000

timeout budget (ms): +${timeout}

Execute a javascript script and store the result if a result is returned

Internal Workflow

The Javascript code is executed
If the javascript code returns a value - it's stored and available in API response result.browser_data.js_scenario.steps. Note that each "execute" step has a result entry.
Supports async/await function

Parameters

script:string Script to execute. It can return a serializable value
timeout:int Timeout to wait after the script execution have started - expressed in millisecond

Usage

{"execute": {"script": "document.querySelector(\"body\").style.backgroundColor = \"red\";}"}

{"execute": {"script": "return navigator.userAgent", "timeout": "1000"}

Wait For Navigation

timeout:int=1000

timeout budget (ms): +${timeout} + 1500

Time to wait to detect a navigation / changing page. The given timeout + 1500 (1.5s) is added to the scenario budget - this additional time represent the average duration of a standard page loading (with assets, xhr, etc). For example if you set a timeout of 1000, 2500 is counted.

Parameters

timeout:int Maximum timeout to wait for a navigation - expressed in millisecond

Usage

{"wait_for_navigation": {}}

{"wait_for_navigation": {"timeout": 5000}}

Wait For Selector

selector:string=body

state:string=visible

timeout budget (ms): +${timeout}

Wait the element is visible (if state=visible) in the page or the element disappear (state=hidden). If the selector is not present in the desired state until the timeout this step failed and the scenario is aborted. The timeout is added to the scenario budget

Parameters

selector:string=body a valid css selector or xpath or "body"
state:string=visible state of the element in the page "visible" or "hidden"
timeout:int=5000 Timeout to wait before fail - expressed in milliseconds

Usage

{"wait_for_selector": {"selector": "#pricing"}}

{"wait_for_selector": {"selector": "#loading", "state": "hidden", "timeout": 10000}}

Javascript Scenario

Usage

Note on Timeouts

Full example with API Player

Example of response with scenario

Scenario Step Types

Click

Internal Workflow

Parameters

Usage

Fill

Internal Workflow

Parameters

Usage

Condition

Parameters

Internal Workflow

Usage

Wait

Parameters

Usage

Scroll

Internal Workflow

Parameters

Usage

Execute

Internal Workflow

Parameters

Usage

Wait For Navigation

Parameters

Usage

Wait For Selector

Parameters

Usage

Summary