Screenshot
Scrapfly's screenshot feature allows to capture a screenshot of the scraped web page. These screenshots can be full page or focused on specific HTML elements via CSS selectors.
Screenshots feature requires Javascript Rendering enabled
Captured screenshots are stored on Scrapfly's servers and the URLs are available in the API response under
result.screenshots
key as well as monitoring logs Screenshot
tab:
The screenshots feature is a great way to understand what is happening during a scrape. It can be used to debug a scraper or to monitor a website for visual changes.
Note that Scrapfly headless browsers are optimized for web scraping and do not render media files like images or videos, so they will not appear in the screenshots. If you need
to render images in screenshot, you must enable the flag screenshot_flags=load_images
, see flags section for more information.
The screenshot feature works beside the scraper and do no affect or alter the scraped result.
A maximum number of 10 different screenshots can be taken per single scrape.
Usage
To use the screenshots feature the screenshots
parameter must be set with screenshot name and the desired capture area:
fullpage
is reserved value for capturing full page.- CSS selector can be used to target specific HTML elements.
For example, to capture the full page and the reviews section of this mock product page https://web-scraping.dev/product/1 we'd use two screenshot parameters:
screenshots[all]=fullpage
to capture the whole page underall
name. Thefullpage
is a reserved value for capturing all page content.screenshots[reviews]=.reviews
to capture CSS selector targeting the reviews section underreviews
name
require "uri"
require "net/http"
url = URI("https://api.scrapfly.io/scrape?render_js=true&screenshots[all]=fullpage&screenshots[reviews]=%23reviews&key=__API_KEY__&url=https%3A%2F%2Fweb-scraping.dev%2Fproduct%2F1")
https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true
request = Net::HTTP::Get.new(url)
response = https.request(request)
puts response.read_body
https://api.scrapfly.io/scrape?render_js=true&screenshots%5Ball%5D=fullpage&screenshots%5Breviews%5D=%2523reviews&key=&url=https%253A%252F%252Fweb-scraping.dev%252Fproduct%252F1
Example Of Response
Download Programmatically
The URL to download the screenshot is located in the response result.screenshots.${name}.url
key.
However, this URL requires authentication and to view it the key
parameter must be added with your Scrapfly key. For example using curl
:
curl "https://api.scrapfly.io/4d1b8e8f-3803-4aa6-88fa-39d5aa81b6b3/scrape/screenshot/db475202-a7c0-4d7b-9179-98089901fce3/main?key=" > screenshot.jpg
Related Errors
All related errors are listed below. You can see the full description and examples of error responses in Errors documentation page.
Options / Flags
You can enable additional options for the screenshot feature by adding the screenshot_flags
parameter to the scrape request. You can set multiple flags by separating them with a comma. Here is the list of supported flags:
load_images
Load images. +3 API Credit is billed per 100kb of media downloadeddark_mode
Enable dark mode displayblock_banners
Block cookies banners and overlay that cover the screenhigh_quality
No compression on the output imageprint_media_format
Render the page in the print mode
Example
require "uri"
require "net/http"
url = URI("https://api.scrapfly.io/scrape?render_js=true&screenshots[all]=fullpage&screenshots[reviews]=%23reviews&screenshot_flags=load_images%2Cblock_banners%2Chigh_quality&key=__API_KEY__&url=https%3A%2F%2Fweb-scraping.dev%2Fproduct%2F1")
https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true
request = Net::HTTP::Get.new(url)
response = https.request(request)
puts response.read_body
https://api.scrapfly.io/scrape?render_js=true&screenshots%5Ball%5D=fullpage&screenshots%5Breviews%5D=%2523reviews&screenshot_flags=load_images%252Cblock_banners%252Chigh_quality&key=&url=https%253A%252F%252Fweb-scraping.dev%252Fproduct%252F1
Limitations
The screenshots feature requires javascript rendering to be enabled in order to work. In some situations, the screenshot capture can fail or be ignored:
- GET Request: Only GET request are eligible for screenshot
- Javascript difficulties: For unusually javascript-heavy pages screenshot can fail as javascript execution can interfere with the screenshot capture mechanism.
- Cache: If the cache feature is used and cache is being HIT instead of the live page then screenshots will be ignored.
Frequently Asked Questions
- Question: How long are screenshot download links are valid for?
- Answer: Screenshot availability duration is determined by your log retention policy (starting at 1 week).
- Question: Why do some screenshots look different compared to a real browser?
- Answer: Scrapfly web browsers are optimized for web scraping, which can make visuals appear slightly different compared to real web browsers.
- Question: Can I limit the API Credit consumption when flag
load_images
is set withcost_budget
? - Answer: Bandwidth usage is not heuristic, we can't predict the consumption of a complete page - the budget limit will not have effect.