What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
Playwright supports one of the most popular ways to parse HTML content in web scraping - XPath selectors. To use XPath in Playwright we can use page.locator()
method and prefix our selector with xpath=
or //
. For example:
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto("https://google.com/")
h2_element = page.locator("//h2")
# or
h2_element = page.locator("xpath=//h2")
⚠ It's possible that this command will try to find elements before the page has fully loaded if it's a dynamic javascript page. For more see How to wait for page to load in Playwright?
Also see: How to find elements by CSS selectors in Playwright?
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇