How to Scrape With Headless Firefox
Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.
When scraping dynamic web pages with Puppeteer and NodeJS we need to wait for the page to fully load before we retrieve the page source. Using Puppeteer's waitForSelector
method we can wait for a specific element to appear on the page which indicates that the web page has fully loaded and then we can grab the page source:
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://httpbin.dev/");
// wait for the selector appear on the page in this case we wait for "Auth" drop down to appear:
await page.waitForSelector('#operations-tag-Auth', {timeout: 5_000});
console.log(await page.content());
browser.close();
}
run();