How to scrape HTML table to Excel Spreadsheet (.xlsx)?

To save an HTML table to an Excel spreadsheet we can use Python with BeautifulSoup4 and xlsxwriter + HTTP client like requests.

$ pip install bs4 xlsxwriter requests

Then, we can scrape the web page, find table data using bs4 and write it to .xlsx file using `xlsxwriter``:

from bs4 import BeautifulSoup
import requests 
import xlsxwriter

# 1. Retrieve HTML and create BeautifulSoup object
response = requests.get("https://www.w3schools.com/html/html_tables.asp")
soup = BeautifulSoup(response.text)
# 2. Find the table and extract headers and rows:
table = soup.find('table', {"id": "customers"})
header = []
rows = []
for i, row in enumerate(table.find_all('tr')):
    if i == 0:
        header = [el.text.strip() for el in row.find_all('th')]
    else:
        rows.append([el.text.strip() for el in row.find_all('td')])
# 3. save to it a XLSX file:
workbook = xlsxwriter.Workbook('output.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write_row(0, 0, header)
for i, row in enumerate(rows):
    worksheet.write_row(i+1, 0, row)
workbook.close()

BeautifulSoup is a very powerful HTML parser giving us full control when it comes to parsing HTML tables. Unlike many automated scripts we can direct it to extract HTML table values from any table structure.

Question tagged: Data Parsing, Python

Related Posts

Intro to Web Scraping Images with Python

In this guide, we’ll explore how to scrape images from websites using different methods. We'll also cover the most common image scraping challenges and how to overcome them. By the end of this article, you will be an image scraping master!

Ultimate XPath Cheatsheet for HTML Parsing in Web Scraping

Ultimate companion for HTML parsing using XPath selectors. This cheatsheet contains all syntax explanations with interactive examples.

Ultimate CSS Selector Cheatsheet for HTML Parsing

Ultimate companion for HTML parsing using CSS selectors. This cheatsheet contains all syntax explanations with interactive examples.