Using Python and BeautifulSoup, we can find any HTML element by partial or exact element name using find
/ find_all
method and regular expressions or CSS selectors:
import re
import bs4
soup = bs4.BeautifulSoup("""
<a>link</a>
<h1>heading 1</h1>
<h2>heading 2</h2>
<p>paragraph</p>
""")
# Using find() and find_all() methods:
# specify exact list
soup.find_all(["h1", "h2", "h3"])
# or regular expression
soup.find_all(re.compile(r"h\d")) # this pattern matches "h<any single digit number>"
[<h1>heading 1</h1>, <h2>heading 1</h2>]
# using css selectors
soup.select("h1, h2, h3")
# or
soup.select(":is(h1, h2, h3)")
[<h1>heading 1</h1>, <h2>heading 1</h2>]