What is a Headless Browser? Top 5 Headless Browser Tools
Quick overview of new emerging tech of browser automation - what exactly are these tools and how are they used in web scraping?
CSS selectors are one of the most popular ways to parse HTML pages when web scraping. Using Selenium, to find elements by CSS selectors we can use driver.find_element()
and driver.find_elements()
methods:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://httpbin.dev/html")
element = driver.find_element(By.CSS_SELECTOR, 'p')
# then we can get the element text
print(element.text)
"Availing himself of the mild, summer-cool weather that now reigned in these latitudes..."
# we can also get tag name and attributes:
print(element.tag_name)
print(element.get_attribute("class"))
# for multiple elements we need to iterate
for element in driver.find_elements(By.CSS_SELECTOR, 'p'):
print(element.text)
driver.close()
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇