Playwright Examples for Web Scraping and Automation
Learn Playwright with Python and JavaScript examples for automating browsers like Chromium, WebKit, and Firefox.
When web scraping, we might want to collect page screenshots or peek into what our headless browsers are seeing for debugging. In Playwright a screenshot can be taken using the screenshot()
method of page
or
from pathlib import Path
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
# To save cookies to a file first extract them from the browser context:
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto('https://httpbin.dev/html')
image_bytes = page.screenshot(
full_page=True, # this will try to scroll to capture full page
path='screenshot.png', # this will save the screenshot directly to a file
clip={"x": 0, "y": 0, "width": 100, "height": 100}, # this will clip the screenshot to a specific region
)
# or we can save it manually
Path("screenshot.png").write_bytes(image_bytes)
# we can also take a screenshot of an element
element = page.locator('p')
image_bytes = element.screenshot(path='screenshot.png')
⚠ Note that when scraping dynamic web pages, screenshots could be captured before the page is fully loaded. For more see How to wait for page to load in Playwright?
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇