What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
To download files with Puppteer we can either the browser's fetch
feature - which will download the file into a javascript variable - or find and click the download button which will download the file to the browser's save directory:
// start puppeteer
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
// go to url
await page.goto("https://httpbin.dev/");
// download file to a javascript variable:
const csvFile = await page.evaluate(() =>
{
// find the url:
const url = document.querySelector('.download-button').getAttribute('href');
// download it using javacript fetch:
return fetch(url, {
method: 'GET',
credentials: 'include'
}).then(r => r.text());
});
Alternatively, we can click the download button using page.click()
command:
// start puppeteer
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
// set default download directory:
const path = require('path');
await page._client.send('Page.setDownloadBehavior', {
behavior: 'allow',
downloadPath: path.resolve('./downloads'),
});
// go to url
await page.goto("https://httpbin.dev/");
// click on download link
await page.click('.download-button');
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇