How to capture background requests and responses in Playwright?

When web scraping using Playwright and Python to capture background requests and responses we can use the page.on() method to add middleware callbacks on request and response events:

from playwright.sync_api import sync_playwright

def intercept_request(request):
    # we can update requests with custom headers
    if "secret" in request.url :
        request.headers['x-secret-token'] = "123"
        print("patched headers of a secret request")
    # or adjust sent data
    if request.method == "POST":
        request.post_data = "patched"
        print("patched POST request")
    return request

def intercept_response(response):
    # we can extract details from background requests
    if response.request.resource_type == "xhr":
        print(response.headers.get('cookie'))
    return response

with sync_playwright() as pw:
    browser = pw.chromium.launch(headless=False)
    context = browser.new_context(viewport={"width": 1920, "height": 1080})
    page = context.new_page()
    # enable intercepting for this page
    page.on("request", intercept_request)
    page.on("response", intercept_response)

    page.goto("https://google.com/")

Often these background requests can contain important dynamic data. Blocking some requests can also reduce the bandwidth used by the scraper, for more on that see How to block resources in Playwright and Python?

Provided by Scrapfly

This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇