What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
When scraping dynamic web pages with Playwright and Python we need to wait for the page to fully load before we retrieve the page source for HTML parsing. Let's explore multiple load event methods to ensure a full web page load!
In order to make Playwright wait for page to load, we can use Playwright's wait_for_selector method. It ensures a full page load state waiting for specific elements to appear on the web page:
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
# go to url
page.goto("https://web-scraping.dev/products")
# wait for element to appear on the page:
page.wait_for_selector("div.products")
# get HTML
print(page.content())
Above, we start by creating a new browser context, navigate to the target web page, and wait for the CSS selector div.products
to be visible. Finally, we return the page object once it's fully loaded.
The seconds waiting process is the wait_for_timeout method. Unlike the previous method, this approach doesn't utilize locating elements on the document. Instead, it instructs the browser to wait for a fixed time:
page.goto("https://web-scraping.dev/products")
page.wait_for_timeout(5000)
Here, we use the wait_for_timeout
to explicitly wait for 5 seconds before executing the remaining script actions.
The latest load event we'll use is the wait_for_load_state, it relies on different states of network connections:
Here's how to use the wait_for_load_state to let Playwright wait for page to load through different states:
page.goto("https://web-scraping.dev/products")
PageMethod("wait_for_load_state", "domcontentloaded"),
PageMethod("wait_for_load_state", "networkidle"),
PageMethod("wait_for_load_state", "load"),
For further details on web scraping with Playwright, refer to our dedicated guide.
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇