How to Track Web Page Changes with Automated Screenshots
In this tutorial we'll take a look at website change tracking using Python, Playwright and Wand. We'll build a tracking tool and schedule it to send us emails on detected changes.
Puppeteer and Playwright are popular headless browser libraries for NodeJS, and one of their use cases is screenshot automation. In this guide, we'll explore using Playwright and Puppeteer to screenshot in NodeJS. We'll start by covering installation, core concepts, and common functionalities to customize website screenshots. Let's get started!
To start, let's go over the installation process. Puppeteer and NodeJS Playwright can be installed using the below npm
command:
npm install puppeteer playwright
Next, install Playwrights' web driver binaries using the below command:
npx install chromium # alternatively install `firefox` or `webkit`
To start, let's explore the basics. We can use the screenshot
method to take Playwright and Puppeteer screenshots in NodeJS:
const puppeteer = require("puppeteer");
async function run() {
// launch a new page
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
// go to the target web page
await page.goto("https://web-scraping.dev/products");
// take page screenshot
await page.screenshot({
type: "png", // can also be "jpeg" or "webp" (recommended)
path: "products.png", // save image data to a PNG file
});
browser.close();
}
run();
const { chromium } = require("playwright");
async function run() {
// launch a new browser tab with empty context
const browser = await chromium.launch({ headless: false });
const context = await browser.newContext();
const page = await context.newPage();
// go to the target web page
await page.goto("https://web-scraping.dev/products");
// take page screenshot
await page.screenshot({ path: "products.png" });
await browser.close();
}
run();
In the above code, we start by launching a headless browser instance and navigating to the target page URL. Then, we take Playwright and Puppeteer screenshots using the same screenshot
method.
Utilizing browser timeouts is crucial to ensure the data to screenshot has fully loaded before we capture a screenshot. For this, we can utilize different waiting strategies before taking screenshots:
async function run() {
// ...
// go to the target web page
await page.goto("https://web-scraping.dev/products", {
// wait for specific load state
waitUntil: "networkidle2", //wait for network state to be idle
waitUntil: "domcontentloaded", // wait for DOM tree to load
waitUntil: "load", // wait for all respurces to load, including CSS and images (default)
});
// ....
}
async function run() {
// ...
// go to the target web page
await page.goto('https://web-scraping.dev/products', {
// wait for specific load state
waitUntil: 'networkidle', //wait for network state to be idle
waitUntil: 'domcontentloaded', // wait for DOM tree to load
waitUntil: 'load', // wait for all resources to load, including CSS and images (default)
});
// ...
}
Here, we use the waitUntil
method to wait for a specific load state before proceeding with the rest of the program, which ensures images load correctly before taking Puppeteer and Playwright NodeJS screenshot.
Alternatively, we can wait for a specific CSS or XPath selector to be present:
await page.waitForSelector("div.products", { timeout: 10000 }); // CSS
await page.waitForSelector("xpath/" + "//div[@class='products']", {
timeout: 10000,
}); // XPath
await page.waitForSelector('div.products', { timeout: 10000 }); // CSS
await page.waitForSelector("//div[@class='products']", {
timeout: 10000,
}); // XPath
Finally, we can use fixed wait conditions:
// wait for fixed timeout
await new Promise((resolve) => setTimeout(resolve, 5000)); // 5 seconds
// wait for fixed timeout
await page.waitForTimeout(5000); // 5 seconds
Since Puppeteer doesn't natively support waiting for fixed waiting methods, we emulate it using promises. As for Playwright, we use the built-in waitForTimeout
method to wait for a fixed timeout.
Note that it's not recommended to use fixed waiting methods when capturing screenshots in Node.js, as they often add unnecessary latency.
One key configuration to consider when taking NodeJS web page screenshots is the browser window viewport. It represents the web browser resolution through width and height dimensions:
const browser = await puppeteer.launch({
headless: false,
args: ["--window-size=1920,1080"],
});
const page = await browser.newPage();
await page.setViewport({
width: 1920,
height: 1080,
});
const browser = await chromium.launch({
headless: false,
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
});
const page = await context.newPage();
Here, we set 1080p resolution using width and height values. Manipulating the viewport enables emulating different devices. For instance, Playwright provides a wide range of device presets to emulate different web browsers and operating systems for further customization while taking a NodeJS screenshot:
const { chromium, devices } = require('playwright');
const browser = await chromium.launch({
headless: false
});
const iphone13 = devices['iPhone 14 Pro Max'];
const context = await browser.newContext({
...iphone13,
});
const page = await context .newPage();
Above, we emulate a mobile browser by selecting a device preset. Playwright will then automatically apply the selected device UseAgent, viewport, and scale factor settings. For the full list of available device profiles, refer to the official device registry.
When taking web page screenshots, it's often convenient to fit the image based on the requirements, and this is where selection targeting comes in handy!
A common use case is taking full web page screenshots. Here's how to approach it in NodeJS:
const puppeteer = require("puppeteer");
async function scroll(page) {
let prevHeight = -1;
let maxScrolls = 100;
let scrollCount = 0;
while (scrollCount < maxScrolls) {
// scroll to the bottom of the page
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)");
// wait for new scroll to finish
await new Promise((resolve) => setTimeout(resolve, 2000));
// calculate new scroll height and compare
let newHeight = await page.evaluate("document.body.scrollHeight");
if (newHeight == prevHeight) {
break;
}
prevHeight = newHeight;
scrollCount += 1;
}
}
async function run() {
const browser = await puppeteer.launch({
headless: false,
args: ["--window-size=1920,1080"],
});
const page = await browser.newPage();
await page.setViewport({
width: 1920,
height: 1080,
});
// go to the target web page
await page.goto("https://web-scraping.dev/testimonials", {
waitUntil: "load",
});
// scroll down to the end of the page
await scroll(page);
await page.screenshot({
type: "png",
path: "full-page-screenshot.png",
fullPage: true,
captureBeyondViewport: false, // prevent image flicking
});
browser.close();
}
run();
const { chromium } = require("playwright");
async function scroll(page) {
let prevHeight = -1;
let maxScrolls = 100;
let scrollCount = 0;
while (scrollCount < maxScrolls) {
// scroll to the bottom of the page
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)");
// wait for new scroll to finish
await page.waitForTimeout(2000);
// calculate new scroll height and compare
let newHeight = await page.evaluate("document.body.scrollHeight");
if (newHeight == prevHeight) {
break;
}
prevHeight = newHeight;
scrollCount += 1;
}
}
async function run() {
const browser = await chromium.launch({
headless: false,
});
const context = await browser.newContext({});
const page = await context.newPage();
// go to the target web page
await page.goto("https://web-scraping.dev/products", {
waitUntil: "load",
});
// scroll down to the end of the page
await scroll(page);
await page.screenshot({
type: "png",
path: "full-page-screenshot.png",
fullPage: true,
});
await browser.close();
}
run();
Here, we take a Node.js full page screenshot on web-scraping.dev/testimonials, which uses infinite scrolling to fetch more data. The headless browser starts by navigating to the target web page and scrolling till the page end. Then, we use the fullpage option
to capture a webpage screen the whole browser viewport.
For further NodeJS screenshot customization, we can capture screenshots of a particular HTML element on the HTML using their equivalent selectors:
async function run() {
// launch a new browser tab
const browser = await puppeteer.launch({
headless: false,
args: ["--window-size=1920,1080"],
});
const page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
// request the web page and wait for the target element to load
await page.goto("https://web-scraping.dev/product/3");
await page.waitForSelector("div.row.product-data")
const element = await page.$('div.row.product-data');
await element.screenshot({
type: "png",
path: "element-screenshot.png",
});
async function run() {
const browser = await chromium.launch({
headless: false,
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
});
const page = await context.newPage();
// request the web page and wait for the target element to load
await page.goto("https://web-scraping.dev/product/3");
await page.waitForSelector("div.row.product-data")
// select the element and capture it
const element = await page.$('div.row.product-data');
await element.screenshot({
path: "element-screenshot.png"
})
await browser.close();
}
Here, we take Playwright and Puppeteer to take a screenshot of a specific element on the web page through the following steps:
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇