How to Scrape With Headless Firefox
Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.
When web scraping, we might want to collect page screenshots or peek into what our headless browsers are seeing for debugging. In Playwright a screenshot can be taken using the screenshot()
method of page
or
from pathlib import Path
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
# To save cookies to a file first extract them from the browser context:
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto('https://httpbin.dev/html')
image_bytes = page.screenshot(
full_page=True, # this will try to scroll to capture full page
path='screenshot.png', # this will save the screenshot directly to a file
clip={"x": 0, "y": 0, "width": 100, "height": 100}, # this will clip the screenshot to a specific region
)
# or we can save it manually
Path("screenshot.png").write_bytes(image_bytes)
# we can also take a screenshot of an element
element = page.locator('p')
image_bytes = element.screenshot(path='screenshot.png')
⚠ Note that when scraping dynamic web pages, screenshots could be captured before the page is fully loaded. For more see How to wait for page to load in Playwright?