How to scrape tables with BeautifulSoup?

by scrapecrow Oct 24, 2022

HTML tables are commonly found across different web pages. They exist as a data frame on the web page. In this guide, we'll explain how to scrape an HTML table with BeautifulSoup as the parsing library through a real-life example. Let's get started!

Setup

Before we start, let's ensure the required libraries are installed.First, let's install the BeautifulSoup package using the pip terminal command:

pip install beautifulsoup4

As for the HTTP client, we'll be using the built-in requests Python library. However, it can be replaced with any other client, such as httpx.

Retrieve Table Data

o start, let's have a look at our target table. We'll be using the target table classes web-scraping.dev/product/1:

scrape table target example on web-scraping.dev

We'll request the above page to retrieve the tables data available in the HTML:

from bs4 import BeautifulSoup
import requests 

response = requests.get("https://web-scraping.dev/product/1")
html = response.text

# Create the soup object
soup = BeautifulSoup(html, "lxml")

Above, we start by requesting the target webpage to retive the HTML tables. Then, we use BeautifulSoup to create a parser object.

Parse HTML Tables

The BeautifulSoup package uses CSS selectors to select HTML elements. Hence, we'll target the table class, and then iterate over its rows:

from bs4 import BeautifulSoup
import requests 

response = requests.get("https://web-scraping.dev/product/1")
html = response.text

soup = BeautifulSoup(html, "lxml")

# First, select the desried table element (the 2nd one on the page)
table = soup.find_all('table', {'class': 'table-product'})[1]

headers = []
rows = []
for i, row in enumerate(table.find_all('tr')):
    if i == 0:
        headers = [el.text.strip() for el in row.find_all('th')]
    else:
        rows.append([el.text.strip() for el in row.find_all('td')])

Above, we first use the find_all method to find all table elements and select the second table on the page. Then, we find each table row and iterate through them extracting their text contents. As for the i == 0 condition, we use it to extract the table header rows, as it's first row in our BeautifulSoup table.

Here are what the results we got should look like:

print(headers)
['Version', 'Package Weight', 'Package Dimension', 'Variants', 'Delivery Type']
for row in rows:
    print(row)
    ['Pack 1', '1,00 kg', '100x230 cm', '6 available', '1 Day shipping']
    ['Pack 2', '2,11 kg', '200x460 cm', '6 available', '1 Day shipping']
    ['Pack 3', '3,22 kg', '300x690 cm', '6 available', '1 Day shipping']
    ['Pack 4', '4,33 kg', '400x920 cm', '6 available', '1 Day shipping']
    ['Pack 5', '5,44 kg', '500x1150 cm', '6 available', '1 Day shipping']

For more details on parsing with BeautifulSoup, refer to our dedicated guide.

How to Parse Web Data with Python and Beautifulsoup

Beautifulsoup is one the most popular libraries in web scraping. In this tutorial, we'll take a hand-on overview of how to use it, what is it good for and explore a real -life web scraping example.

How to Parse Web Data with Python and Beautifulsoup

Related Articles

How to Parse Web Data with Python and Beautifulsoup

Beautifulsoup is one the most popular libraries in web scraping. In this tutorial, we'll take a hand-on overview of how to use it, what is it good for and explore a real -life web scraping example.

BEAUTIFULSOUP
DATA-PARSING
PYTHON
How to Parse Web Data with Python and Beautifulsoup

Guide to List Crawling: Everything You Need to Know

In-depth look at list crawling - how to extract valuable data from list-formatted content like tables, listicles and paginated pages.

CRAWLING
BEAUTIFULSOUP
PYTHON
Guide to List Crawling: Everything You Need to Know

Ultimate Guide to JSON Parsing in Python

Learn JSON parsing in Python with this ultimate guide. Explore basic and advanced techniques using json, and tools like ijson and nested-lookup

DATA-PARSING
PYTHON
Ultimate Guide to JSON Parsing in Python

Guide to Parsel - the Best HTML Parsing in Python

Learn to extract data from websites with Parsel, a Python library for HTML parsing using CSS selectors and XPath.

DATA-PARSING
PARSEL
Guide to Parsel - the Best HTML Parsing in Python

JSONL vs JSON

Learn the differences between JSON and JSONLines, their use cases, and efficiency. Why JSONLines excels in web scraping and real-time processing

DATA-PARSING
JSONL vs JSON

Web Scraping and HTML Parsing with Jsoup and Java

Learn how to harness the power of jsoup, a lightweight and efficient Java library for web scraping and HTML parsing.

DATA-PARSING
JAVA
Web Scraping and HTML Parsing with Jsoup and Java