What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
When web scraping, we often need to save the connection state like browser cookies and resume it later. Using Puppeteer, to save and load cookies we can use page.cookies()
and page.setCookie()
methods:
const puppeteer = require('puppeteer');
const fs = require('fs').promises;
async function run() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// get some cookies:
await page.goto("https://httpbin.dev/cookies/set/mycookie/myvalue");
// then we can save them as JSON file:
const cookies = await page.cookies();
await fs.writeFile('cookies.json', JSON.stringify(cookies));
// then later, we load the cookies from file:
const cookies = JSON.parse(await fs.readFile('./cookies.json'));
await page.setCookie(...cookies);
await page.goto("https://httpbin.dev/cookies");
console.log(await page.content())
browser.close();
}
run();
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇