🚀 We are hiring! See open positions

Knowledge Base

Quick answers to common web scraping questions 161 answers

? Answers

17 answers
Q

How to find HTML elements by text value with BeautifulSoup

To find HTML elements by text value using Beautifulsoup and Python, regular expression patterns can be used in the text parameter of find functions. H...

Q

How to find sibling HTML nodes using BeautifulSoup and Python?

To find sibling HTML element nodes using BeautifulSoup the find_next_sibling() method can be used or CSS selector ~. Here's how to do it in Python.

data-parsing beautifulsoup css-selectors
Q

How to get page source in Selenium?

To get full web page source in Selenium the driver.page_source property can be used. Here's how to do it in Python and Selenium.

python headless-browser selenium
Q

How to save and load cookies in Selenium?

To save and load cookies of a Selenium browser we can use driver.get_cookies() and driver.add_cookies() methods. Here's how to use them.

python headless-browser selenium
Q

How to select values between two nodes in BeautifulSoup and Python?

To select HTML element located between two HTML elements using BeautifulSoup the find_next_sibling() method can be used. Here's how to do it.

data-parsing beautifulsoup
Q

How to take a screenshot with Selenium?

To take a web page screenshot using Selenium the driver.save_screenshot() method can be used or element.screenshot() for specific element. Here's how ...

python headless-browser selenium
Q

How to wait for page to load in Selenium?

To wait for specific HTML element to load in Selenium the WebDriverWait() object can be used with presence_of_element_located parameters. Here's how t...

headless-browser selenium data-parsing
Q

Can I used XPath selectors in BeautifulSoup?

BeautilfulSoup for Python doesn't support XPath selectors but there are popular alternatives to fill in this niche. Here are some.

data-parsing beautifulsoup xpath
Q

How to find all links using BeautifulSoup and Python?

To find all links in the HTML pages using BeautifulSoup and Python the find_all() method can be used. Here's how to do it.

crawling data-parsing beautifulsoup
Q

How to find HTML elements by attribute using BeautifulSoup?

To find HTML node by a specific attribute value in BeautifulSoup the attribute match parameter can be used in the find() methods. Here's how.

data-parsing beautifulsoup css-selectors
Q

How to find HTML elements by class?

To find HTML nodes by class name CSS selectors or XPath can be used. For that .class css selector can be used or XPath's text() matcher.

data-parsing css-selectors xpath
Q

How to find HTML element by class with BeautifulSoup?

To find HTML node by class name using BeautifulSoup the class match parameter can be used using the find() methods. Here's how to do it.

data-parsing beautifulsoup css-selectors
Q

How to scrape tables with BeautifulSoup?

To scrape HTML tables using BeautifulSoup and Python the find_all() method can be used with common table parsing algorithms. Here's how to do it.

data-parsing beautifulsoup
Q

What are some BeautifulSoup alternatives in Python?

BeautifulSoup is a popular HTML library for Python. It's most popular alternatives are lxml, parsel and html5lib. Here's how they differ from bs4.

python data-parsing beautifulsoup
Q

What's the difference between Web Scraping and Crawling?

Web Scraping and Web Crawling are similar but not quite the same. Crawling is a form of web scraping and here are some major differences.

crawling
Q

How to block resources in Puppeteer?

Blocking non-critical resources in Puppeteer can drastically speed up the program. Here's how to do in Puppeteer and Nodejs.

puppeteer
Q

How to download a file with Puppeteer?

To download a file using Puppeteer and NodeJS we can either simulate the click on the download button or use HTTP client. Here's how to do it.

puppeteer

Ready to scale your web scraping?

Anti-bot bypass, browser rendering, and rotating proxies — all in one API.