Javascript Rendering
All features available below require javascript rendering activated. Withrender_js=true
from API or by tickingJavascript rendering
in API Player
Javascript Rendering is only available with GET
method.
Introduction
TL;DR Fully managed chrome headless cluster seamlessly integrated with our API and features (session, cache etc)
Scrapfly provide a complete tool suite to scrape complex / dynamic website. With the rising of javascript ecosystem, more and more website delegate to client side. It means the content is modified directly in the browser and version that you get with or without javascript rendered can be different. Most of the time, the "regular" way to scrape do not contain what your are looking for.
Enabling javascript rendering feature comes with many built in features such as : xhr records, remote script execution, collect local storage and screenshot.
Web scraping with javascript rendering is slower than without. Most of the time it's unpredictable, it depends on the rendering time, resources loaded. More a website load and executes things, more the scrape call will be slower.
Fully Managed Browser
Headless browser is fully managed, from infrastructure to fingerprinting compliance. We ensure our browser is undetectable from anti bot system to get better successful scrape ratio.
Javascript Execution
Javascript execution give you the ability to interact with upstream website, like a real user do. You can play scenario, clicking buttons, filling forms etc. You can also collect data and get them back. You can also adjust the rendering time, to ensure the termination of your script before collecting content.
API
To inject your own javascript with Scrapfly API you must base64 encode your script and pass it via js
parameters. In combination, you can also control the time to wait to ensure rendering is complete via rendering_wait
, expressed in
millisecond (Multiply second by 1000), e.g : 5000 to wait 5s. The maximum time to wait rendering is 15s (15 000)
Example : Retrieve article title from Hacker News via Injected Javascript
return Array.from(document.querySelectorAll('td.title > a')).map((el) => el.textContent)
Interactive Example: Javascript Execution
curl -X GET https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fnews.ycombinator.com&render_js=true&js=Y21WMGRYSnVJRUZ5Y21GNUxtWnliMjBvWkc5amRXMWxiblF1Y1hWbGNubFRaV3hsWTNSdmNrRnNiQ2duZEdRdWRHbDBiR1VnUGlCaEp5a3BMbTFoY0Nnb1pXd3BJRDArSUdWc0xuUmxlSFJEYjI1MFpXNTBLUT09
HTTP Call Pretty Print
[
"US Travel firm $4.5m ransom negotiation open chat",
"Laws of UX",
"Briar Project",
"Pleroma: A Mastodon-compatible open and federated social networking server",
"Mastodon 3.2",
"A philosophical difference between Haskell and Lisp",
"Show HN: High performance X11 animated wallpapers",
"When I raised my B2B SaaS\u2019s prices",
"Illustrated Self-Guided Course On How To Use The Slide Rule",
"Facebook hate-speech boycott had little effect on revenue",
"SpaceX Crew Dragon Splashes Down in the Gulf of Mexico",
"Why Can't We All Just Get Along? Uncertain Biological Basis of Morality (2013)",
"\u201cZombie cicadas\u201d infected with mind-controlling fungus return to West Virginia",
"What is a Product Roadmap?",
"Brain-Gut Circuit Lets Microbiota Directly Affect the Sympathetic Nervous System",
"How to Run Turing Machines on Encrypted Data [pdf]",
"A collection of books, talks, and papers on security engineering",
"Rethinking the Science of Skin",
"GITenberg is an open source community for publishing ebooks in the public domain",
"How real are real numbers? (2004)",
"Beyond Bitswap",
"What I Learned About Failing from My 5 Year Indie Game Dev Project",
"The Architecture of the Medieval Page (2018)",
"I Still Use an Old PowerPC Mac in 2020",
"Microsoft to continue discussions on potential TikTok purchase in the US",
"Lord and Taylor, Oldest U.S. Department Store, Files Bankruptcy",
"\u03bcPlot v1.1 \u2013 now with log scales support",
"OCaml for the Skeptical: OCaml in a Nutshell (2006)",
"GPU Accelerated JavaScript",
"Show HN: Create beautiful landing pages by copy-paste",
"More"
]
Persistence
When you are using session features, local storage and session storage data are persisted. You can also retrieve the content through the API
XHR requests
Resources such as css, jpg, png, video, etc
are not tracked.
You can retrieve emitted XHR call, with associated url, headers, body and method. We do not attach the response. If you need the response content, you can simply directly call the XHR url.
...
"result": {
...,
"browser_data": {
"xhr_call": [
{
"url": "https://aan.amazon.fr/cem",
"headers": {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36",
"content-type": "application/json",
"accept": "*/*",
"referer": "https://images-eu.ssl-images-amazon.com/images/G/08/ape/sf/whitelisted/desktop/sf-1.50.628cb61._V408130105_.html"
},
"method": "POST",
"body": "{\"render_id\":\"4a7152f0-cb58-4de8-b152-f0cb58cde8a2\",\"event_type\":\"impression\",\"dimensions\":{\"subtype\":\"impression\",\"value\":1,\"template_name\":\"Dynamic eCommerce - universal\"}}"
},
{
"url": "https://www.amazon.fr/gp/customer-reviews/aj/private/reviewsGallery/get-image-gallery-assets",
"headers": {
"rtt": "0",
"accept": "text/html,*/*",
"x-requested-with": "XMLHttpRequest",
"downlink": "10",
"ect": "4g",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36",
"content-type": "application/x-www-form-urlencoded",
"referer": "https://www.amazon.fr/gp/product/B008AVQXDO?pf_rd_r=APG7NKFQ8DTBPK2TEN8R&pf_rd_p=70373c30-7461-4a24-bb1f-f3fde4f2df3a",
"cookie": "session-id=261-7851197-2783504; i18n-prefs=EUR; ubid-acbfr=262-5387700-5547500; session-id-time=2082754801l; x-wl-uid=145H5Y5j+m7oe7NpElaItmpA5YWGFqUy34ZvPnc+Yd8m+UIZC49+YTzyieSn/K4Kfq162NF1AbZo=; session-token=aLl1Sgktrzq+wYbYCVAKoXJA+3aIAhtP36mNtxkpZORbiSqd3ur/uaU6W1aHycEtUy4LpAJrcV2YmGqNHYb4trXCj3Wt4Vxc5W/aCaww5HctUNsijeRB2Dxp/ca1gtYdEEpTJGBprLlnrFg85RsOkfiWb9nysakwy54GjF9aOjksmN0ip3XCgDbO9uIZ7/X8lgM7pTDy7tTVBJtRvK79S/k9PbfDxEjXULIpNE8iYBdTvm95Xevgmgr1nouA1frzwUFYYzhCg1k=; csm-hit=tb:s-5B0K136YR4QK89MQ8RG0|1596420691120&t:1596420692684&adb:adblk_no"
},
"method": "POST",
"body": null
},
{
"url": "https://www.amazon.fr/gp/customer-reviews/aj/private/reviewsGallery/get-application-resources-for-reviews-gallery",
"headers": {
"rtt": "0",
"accept": "*/*",
"x-requested-with": "XMLHttpRequest",
"downlink": "10",
"ect": "4g",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36",
"content-type": "application/x-www-form-urlencoded",
"referer": "https://www.amazon.fr/gp/product/B008AVQXDO?pf_rd_r=APG7NKFQ8DTBPK2TEN8R&pf_rd_p=70373c30-7461-4a24-bb1f-f3fde4f2df3a",
"cookie": "session-id=261-7851197-2783504; i18n-prefs=EUR; ubid-acbfr=262-5387700-5547500; session-id-time=2082754801l; x-wl-uid=145H5Y5j+m7oe7NpElaItmpA5YWGFqUy34ZvPnc+Yd8m+UIZC49+YTzyieSn/K4Kfq162NF1AbZo=; session-token=aLl1Sgktrzq+wYbYCVAKoXJA+3aIAhtP36mNtxkpZORbiSqd3ur/uaU6W1aHycEtUy4LpAJrcV2YmGqNHYb4trXCj3Wt4Vxc5W/aCaww5HctUNsijeRB2Dxp/ca1gtYdEEpTJGBprLlnrFg85RsOkfiWb9nysakwy54GjF9aOjksmN0ip3XCgDbO9uIZ7/X8lgM7pTDy7tTVBJtRvK79S/k9PbfDxEjXULIpNE8iYBdTvm95Xevgmgr1nouA1frzwUFYYzhCg1k=; csm-hit=tb:s-5B0K136YR4QK89MQ8RG0|1596420691120&t:1596420692684&adb:adblk_no"
},
"method": "POST",
"body": "noCache=1596420693002"
},
{
"url": "https://www.amazon.fr/gp/cerberus/gv",
"headers": {
"rtt": "0",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36",
"content-type": "application/x-www-form-urlencoded",
"accept": "*/*",
"cache-control": "no-cache",
"x-requested-with": "XMLHttpRequest",
"downlink": "10",
"ect": "4g",
"referer": "https://www.amazon.fr/gp/product/B008AVQXDO?pf_rd_r=APG7NKFQ8DTBPK2TEN8R&pf_rd_p=70373c30-7461-4a24-bb1f-f3fde4f2df3a",
"cookie": "session-id=261-7851197-2783504; i18n-prefs=EUR; ubid-acbfr=262-5387700-5547500; session-id-time=2082754801l; x-wl-uid=145H5Y5j+m7oe7NpElaItmpA5YWGFqUy34ZvPnc+Yd8m+UIZC49+YTzyieSn/K4Kfq162NF1AbZo=; session-token=aLl1Sgktrzq+wYbYCVAKoXJA+3aIAhtP36mNtxkpZORbiSqd3ur/uaU6W1aHycEtUy4LpAJrcV2YmGqNHYb4trXCj3Wt4Vxc5W/aCaww5HctUNsijeRB2Dxp/ca1gtYdEEpTJGBprLlnrFg85RsOkfiWb9nysakwy54GjF9aOjksmN0ip3XCgDbO9uIZ7/X8lgM7pTDy7tTVBJtRvK79S/k9PbfDxEjXULIpNE8iYBdTvm95Xevgmgr1nouA1frzwUFYYzhCg1k=; csm-hit=tb:s-5B0K136YR4QK89MQ8RG0|1596420691120&t:1596420692684&adb:adblk_no"
},
"method": "POST",
"body": "payload=%7B%22producerId%22%3A%22detail-page%22%2C%22asin%22%3A%22B008AVQXDO%22%2C%22asin_price%22%3A%229.49%22%2C%22asin_shipping_price%22%3A%220%22%2C%22asin_currency_code%22%3A%22EUR%22%2C%22device_type%22%3A%22WEB%22%2C%22display_code%22%3A%22Asin+is+not+eligible+because+it+has+a+retail+offer%22%2C%22substitute_count%22%3A%22-1%22%7D"
}
],
"local_storage_data": {
"csm-hit": "tb:s-5B0K136YR4QK89MQ8RG0|1596420691120&t:1596420692684&adb:adblk_no",
"csm:adb": "adblk_no",
"csm-bf": "[\"5B0K136YR4QK89MQ8RG0\"]",
"a-font-class": "a-ember"
},
"session_storage_data": {
"csm-hit": "tb:s-5B0K136YR4QK89MQ8RG0|1596420691120&t:1596420692684&adb:adblk_no",
"csm:adb": "adblk_no",
"csm-bf": "[\"5B0K136YR4QK89MQ8RG0\"]",
"a-font-class": "a-ember"
},
"javascript_evaluation_result": null
},
...
}
...
Pricing
Using javascript rendering will cost 10 Scrape API calls against your quota. Keep in mind Javascript Rendering is slow and use a lot of data / resources, for maximum performance you should avoid it when it's not required.
API Response contain header X-Scrapfly-Api-Cost
indicate you the billed amount