What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
When scraping using Puppeteer we might encounter modal popups which are Javascript alerts that hide the content on page load and show some sort of message like this one:
The most common example of modal popup is the cookie consent popup and there are multiple ways to handle popups in Puppeteer:
For example, let's take a look at web-scraping.dev/login page which on page load throws a cookie pop-up:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://web-scraping.dev/login');
// Option #1 - use page.click() to click on the button
try {
await page.waitForSelector('#cookie-ok', { timeout: 2000 });
await page.click('#cookie-ok');
} catch (error) {
console.log('no cookie popup');
}
// Option #2 - delete the popup HTML
// remove pop up
const cookieModal = await page.$('#cookieModal');
if (cookieModal) {
await page.evaluate((el) => el.remove(), cookieModal);
}
// remove grey backgdrop which covers the screen
const modalBackdrop = await page.$('.modal-backdrop');
if (modalBackdrop) {
await page.evaluate((el) => el.remove(), modalBackdrop);
}
await browser.close();
})();
Above, we explore two ways to handle modal pop-ups: clicking a button that would dismiss it and hard removing them from the DOM.
Generally, the first approach is more reliable as the real button click can have functionality attached to it like setting a cookie so the pop-up doesn't appear again.
For cases when it's a login requirement or advertisement, the second approach is more suited.
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇