Web Scraping With a Headless Browser: Puppeteer
Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practices and example project.
To handle browser dialog pop-ups in Puppeteer like this one seen on web-scraping.dev cart page:
We can use the dialog event handler to check the dialog message and press yes/no. This can be done using the page.on("dialog", handler)
method:
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
// set up a dialog event handler
page.on('dialog', async dialog => {
console.log(dialog.message());
if(dialog.message().includes('clear your cart')) {
console.log(`clicking "Yes" to ${dialog.message()}`);
await dialog.accept(); // press 'Yes'
} else {
await dialog.dismiss(); // press 'No'
}
});
// add something to cart
await page.goto('https://web-scraping.dev/product/1');
await page.click('.add-to-cart');
// try clearing cart which raises a dialog that says "are you sure you want to clear your cart?"
await page.goto('https://web-scraping.dev/cart');
await page.waitForSelector('.cart-full .cart-item');
await page.click('.cart-full .cart-clear');
// check the cart
const cartItems = await page.$('.cart-item .cart-title');
console.log(`items in cart: ${cartItems ? 1 : 0}`); // Should print 0 if no items in cart.
await browser.close();
}
run();
In the examle above, we attach a dialog handler to our page
object which checks whether the dialog message contains the text "clear your cart" and if so, it clicks "Yes" to clear the cart. Otherwise, it clicks "No" to cancel the dialog.