How to Scrape With Headless Firefox
Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.
When scraping dynamic web pages with Selenium we need to wait for the page to fully load before we retrieve the page source. Using Selenium WebDriverWait
function we can wait for a specific element to appear on the page which indicates that the web page has fully loaded and then grab the page source:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://httpbin.dev/")
_timeout = 10 # ⚠ don't forget to set a reasonable timeout
WebDriverWait(driver, _timeout).until(
expected_conditions.presence_of_element_located(
# we can wait by any selector type like element id:
(By.ID, "operations-tag-Auth")
# or by class name
# (By.CLASS_NAME, ".price")
# or by xpath
# (By.XPATH, "//h1[@class='price']")
# or by CSS selector
# (By.CSS_SELECTOR, "h1.price")
)
)
print(driver.page_source)