How to select values between two nodes in BeautifulSoup and Python?

by scrapecrow Oct 26, 2022

When web scraping, sometimes the values we want are located between two known HTML elements. To extract them we can use find_all() and find_next_siblings() methods:

import bs4
soup = bs4.BeautifulSoup("""
<h2>heading 1</h2>
<p>paragraph 1</p>
<p>paragraph 2</p>
<h2>heading 2</h2>
<p>paragraph 3</p>
<p>paragraph 4</p>
""")

blocks = {}
for heading in soup.find_all("h2"):  # find separators, in this case h2 nodes
    values = []
    for sibling in heading.find_next_siblings():
        if sibling.name == "h2":  # iterate through siblings until separator is encoutnered
            break
        values.append(sibling.text)
    blocks[heading.text] = values

print(blocks)
{
  'heading 1': ['paragraph 1', 'paragraph 2'], 
  'heading 2': ['paragraph 3', 'paragraph 4']
}

Related Articles

How to Parse Web Data with Python and Beautifulsoup

Beautifulsoup is one the most popular libraries in web scraping. In this tutorial, we'll take a hand-on overview of how to use it, what is it good for and explore a real -life web scraping example.

BEAUTIFULSOUP
DATA-PARSING
PYTHON
How to Parse Web Data with Python and Beautifulsoup

Guide to List Crawling: Everything You Need to Know

In-depth look at list crawling - how to extract valuable data from list-formatted content like tables, listicles and paginated pages.

CRAWLING
BEAUTIFULSOUP
PYTHON
Guide to List Crawling: Everything You Need to Know

Ultimate Guide to JSON Parsing in Python

Learn JSON parsing in Python with this ultimate guide. Explore basic and advanced techniques using json, and tools like ijson and nested-lookup

DATA-PARSING
PYTHON
Ultimate Guide to JSON Parsing in Python

Guide to Parsel - the Best HTML Parsing in Python

Learn to extract data from websites with Parsel, a Python library for HTML parsing using CSS selectors and XPath.

DATA-PARSING
PARSEL
Guide to Parsel - the Best HTML Parsing in Python

JSONL vs JSON

Learn the differences between JSON and JSONLines, their use cases, and efficiency. Why JSONLines excels in web scraping and real-time processing

DATA-PARSING
JSONL vs JSON

Web Scraping and HTML Parsing with Jsoup and Java

Learn how to harness the power of jsoup, a lightweight and efficient Java library for web scraping and HTML parsing.

DATA-PARSING
JAVA
Web Scraping and HTML Parsing with Jsoup and Java