What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
Web scraping modern websites faces numerous challenges, one of them being infinite scroll pages. A common solution for such cases is using Selenium scroll to bottom of the page.
To scroll down a web page with Selenium scripts, we can use the popular window.scrollTo(x, y)
JavaScript method. This simple scroll operation navigates to Selenium driver to the specified coordinates.
Since infinite scroll pages require executing multiple vertical scroll operations. In other words, to scroll till end means that there are no more changes recorded in the window height. Let's have a look at an example by scraping web-scraping.dev/testimonials:
from selenium webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
# initiate a new Chrome webdriver
driver = webdriver.Chrome()
# navigate to the target web page
driver.get("https://web-scraping.dev/testimonials/")
prev_height = -1
max_scrolls = 100
scroll_count = 0
while scroll_count < max_scrolls:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(1) # give some time for new results to load
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == prev_height:
break
prev_height = new_height
scroll_count += 1
# Collect all loaded data
elements = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "testimonial")))
results = []
for element in elements:
text = element.find_element(By.CLASS_NAME, "text").get_attribute('innerHTML')
results.append(text)
print(f"scraped: {len(results)} results!")
driver.quit()
In the above code snippet, we infinite scroll the page in Selenium webdriver using two parameters: previous and current height. It starts by executing the window.scrollTo
method to scroll till the page end, record the new height, and compare the previous and current ones till the maximum height is reached.
For further details into using Selenium for web scraping refer to our dedicated guide.
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇