How to Scrape With Headless Firefox
Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.
When web scraping, we might want to collect page screenshots or peek into what our headless browsers are seeing for debugging. In Puppeteer a screenshot can be taken using the screenshot()
method of page
or element
objects:
const puppeteer = require('puppeteer');
async function run() {
// usual browser startup:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://httpbin.dev/html");
// wait for the selector appear on the page
await page.screenshot({
"type": "png", // can also be "jpeg" or "webp" (recommended)
"path": "screenshot.png", // where to save it
"fullPage": true, // will scroll down to capture everything if true
});
// alternatively we can capture just a specific element:
const element = await page.$("p");
await element.screenshot({"path": "just-the-paragraph.png", "type": "png"});
browser.close();
}
run();
⚠ Note that when scraping dynamic web pages, screenshots could be captured before the page is fully loaded. For more see How to wait for a page to load in Puppeteer?