How to Scrape With Headless Firefox
Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.
To test our Puppeteer web scrapers we might want o use local files instead of public websites. Just like real web browsers Puppeteer can load local files using the file://
URL protocol:
const puppeteer = require('puppeteer');
const path = require('path');
async function run() {
// usual browser startup:
const browser = await puppeteer.launch();
const page = await browser.newPage();
// we can use absolute paths like
await page.goto("file://home/user/projects/test.html"); // linux
await page.goto("file://C:/Users/projects/test.html"); // windows
// or we can use relative paths:
// below will select test.html that is in the same directory as the script
await page.goto(`file:${path.join(__dirname, 'test.html')}`);
console.log(await page.content());
browser.close();
}
run();
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇