What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
Puppeteer stealth is a popular extension for the Puppeteer browser automation framework. This plugin patches Puppeteer runtime to be less likely to be detected by anti-scraping detection techniques.
Using puppeteer-stealth scrapers have better chance at bypassing Cloudflare, Datadome and other popular anti scraping services.
puppeteer-stealth can be installed using NPM:
$ npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
# or
$ yarn add puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
Then the StealthPlugin
object needs to be attached to enable the extension:
// Note: import puppeteer-extra rather than puppeteer
const puppeteer = require('puppeteer-extra')
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
// test run - check scrapfly.io browser fingerprint page
puppeteer.launch({ headless: true }).then(async browser => {
console.log('Running tests..')
const page = await browser.newPage()
await page.goto('https://scrapfly.io/web-scraping-tools/browser-fingerprint')
await page.waitForTimeout(5000)
await page.screenshot({ path: 'testresult.png', fullPage: true })
await browser.close()
console.log(`All done, check the screenshot. ✨`)
})
Note that puppeteer-stealth
features many patches for different detection techniques that can be customized and extended.
Alternatively, Scrapfly API automatically bypasses anti scraping protections using anti scraping protection bypass feature
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇