Intro to Web Scraping using Selenium Grid
In this guide, you will learn about installing and configuring Selenium Grid with Docker and how to use it for web scraping at scale.
Selenium is a popular web browser automation library used for web scraping. To run, however, Selenium needs special web browser executables called drivers. For example, to run Firefox web browser Selenium needs geckodriver to be installed. Without it a generic exception will be raised:
selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH.
This can also mean that the geckodriver is installed but Selenium can't find it. To fix this the geckodriver location should be added to the PATH
environment variable:
$ export PATH=$PATH:/location/where/geckodriver/is/
Alternatively, we can specify the driver directly in the Selenium initiation code:
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'your\path\geckodriver.exe')
driver.get('https://scrapfly.io/')