What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
To test our Puppeteer web scrapers we might want o use local files instead of public websites. Just like real web browsers Puppeteer can load local files using the file://
URL protocol:
const puppeteer = require('puppeteer');
const path = require('path');
async function run() {
// usual browser startup:
const browser = await puppeteer.launch();
const page = await browser.newPage();
// we can use absolute paths like
await page.goto("file://home/user/projects/test.html"); // linux
await page.goto("file://C:/Users/projects/test.html"); // windows
// or we can use relative paths:
// below will select test.html that is in the same directory as the script
await page.goto(`file:${path.join(__dirname, 'test.html')}`);
console.log(await page.content());
browser.close();
}
run();
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇