How to block resources in Playwright and Python?

To speed up Playwright web scrapers we can block media and other non-essential requests using the request interception feature:

from playwright.sync_api import sync_playwright

# block pages by resource type. e.g. image, stylesheet
BLOCK_RESOURCE_TYPES = [
  'beacon',
  'csp_report',
  'font',
  'image',
  'imageset',
  'media',
  'object',
  'texttrack',
#  we can even block stylsheets and scripts though it's not recommended:
# 'stylesheet',
# 'script',  
# 'xhr',
]


# we can also block popular 3rd party resources like tracking and advertisements.
BLOCK_RESOURCE_NAMES = [
  'adzerk',
  'analytics',
  'cdn.api.twitter',
  'doubleclick',
  'exelator',
  'facebook',
  'fontawesome',
  'google',
  'google-analytics',
  'googletagmanager',
]

def intercept_route(route):
    """intercept all requests and abort blocked ones"""
    if route.request.resource_type in BLOCK_RESOURCE_TYPES:
        print(f'blocking background resource {route.request} blocked type "{route.request.resource_type}"')
        return route.abort()
    if any(key in route.request.url for key in BLOCK_RESOURCE_NAMES):
        print(f"blocking background resource {route.request} blocked name {route.request.url}")
        return route.abort()
    return route.continue_()

with sync_playwright() as pw:
    browser = pw.chromium.launch(
        headless=False, 
        # tip: you can enable devtools so we can see total resource usage (bottom left corner)
        devtools=True, 
    )
    context = browser.new_context(viewport={"width": 1920, "height": 1080})
    page = context.new_page()
    # enable intercepting for this page, **/* stands for all requests
    page.route("**/*", intercept_route)
    page.goto("http://some-webpage.com/")

Resource blocking can significantly reduce bandwidth usage - often by 2-10 times! Take note though that blocking functional resources like stylesheets, scripts and xhr could affect the web scraping process.

Provided by Scrapfly

This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇