What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
The most popular way to parse HTML content in web scraping are CSS selectors which are the default way to locate elements in Playwright as well. To find elements using CSS selectors we can use page.locator()
method. For example:
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto("https://google.com/")
h2_element = page.locator("h2.some-class")
⚠ It's possible that these commands will try to find elements before the page has fully loaded if it's a dynamic javascript page. For more see How to wait for page to load in Playwright?
Also see: How to find elements by XPath selectors in Playwright?
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇