What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
When scraping dynamic web pages with Puppeteer and NodeJS we need to wait for the page to fully load before we retrieve the page source. Using Puppeteer's waitForSelector
method we can wait for a specific element to appear on the page which indicates that the web page has fully loaded and then we can grab the page source:
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://httpbin.dev/");
// wait for the selector appear on the page in this case we wait for "Auth" drop down to appear:
await page.waitForSelector('#operations-tag-Auth', {timeout: 5_000});
console.log(await page.content());
browser.close();
}
run();
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇