How to capture background requests and responses in Puppeteer?

When web scraping using Puppeteer and Python to capture background requests and responses we can use the page.on() method to add callbacks on request and response events:

const puppeteer = require('puppeteer');

function run() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  // capture background requests:
  await page.setRequestInterception(true);
  page.on('request', request => {
    if (request.resourceType() === 'xhr') {
      console.log(request):
      // we can block these requests with:
      request.abort();
    } else {
      request.continue();
    }
  });
  // capture background responses:
  page.on('response', response => {
    if (response.resourceType() === 'xhr') {
      console.log(response);
    }
  })
  await browser.close();
}

run();

Often these background requests can contain important dynamic data. Blocking some requests can also reduce the bandwidth used by the scraper, for more on that see How to block resources in Puppeteer?

Related Posts

Web Scraping With a Headless Browser: Puppeteer

Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practices and example project.

Scraping Dynamic Websites Using Web Browsers

Introduction to using web automation tools such as Puppeteer, Playwright, Selenium and ScrapFly to render dynamic websites for web scraping