How to Scrape With Headless Firefox
Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.
When scraping dynamic web pages with Playwright and Python we need to wait for the page to fully load before we retrieve the page source. Using Playwright's wait_for_selector()
method we can wait for a specific element to appear on the page which indicates that the web page has fully loaded and then we can grab the page source:
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
# go to url
page.goto("https://twitch.tv/directory/game/Art")
# wait for element to appear on the page:
page.wait_for_selector("div[data-target=directory-first-item]")
# get HTML
print(page.content())