How to block resources in Puppeteer?

To speed up Puppeteer web scrapers we can block media and other non-essential requests using the request interception feature:

// we can block by resrouce type like fonts, images etc.
const blockResourceType = [
  'beacon',
  'csp_report',
  'font',
  'image',
  'imageset',
  'media',
  'object',
  'texttrack',
];
// we can also block by domains, like google-analytics etc.
const blockResourceName = [
  'adition',
  'adzerk',
  'analytics',
  'cdn.api.twitter',
  'clicksor',
  'clicktale',
  'doubleclick',
  'exelator',
  'facebook',
  'fontawesome',
  'google',
  'google-analytics',
  'googletagmanager',
  'mixpanel',
  'optimizely',
  'quantserve',
  'sharethrough',
  'tiqcdn',
  'zedo',
];
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
// we need to enable interception feature
await page.setRequestInterception(true);
// then we can add a call back which inspects every
// outgoing request browser makes and decides whether to allow it
page.on('request', request => {
  const requestUrl = request._url.split('?')[0];
  if (
    (request.resourceType() in blockedResourceType) ||
    blockResourceName.some(resource => requestUrl.includes(resource))
  ) {
    request.abort();
  } else {
    request.continue();
  }
});
}
Question tagged: Puppeteer

Related Posts

How to Scrape With Headless Firefox

Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.

How to Use Chrome Extensions with Playwright, Puppeteer and Selenium

In this article, we'll explore different useful Chrome extensions for web scraping. We'll also explain how to install Chrome extensions with various headless browser libraries, such as Selenium, Playwright and Puppeteer.

How to Web Scrape with Puppeteer and NodeJS in 2024

Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practices and example project.