Web Scraping with Python and BeautifulSoup
Beautifulsoup is one the most popular libraries in web scraping. In this tutorial, we'll take a hand-on overview of how to use it, what is it good for and explore a real -life web scraping example.
Using Python and Beautifulsoup, to find elements without a specific attribute (like class
) we can use find
or find_all
methods or CSS selectors:
import bs4
soup = bs4.BeautifulSoup("""
<a class="ignore">bad link</a>
<a>good link</a>
""")
soup.find_all("a", class_=None)
["<a>good link</a>]
# or using a lambda function:
soup.find_all("a", class_=lambda value: "ignore" not in value)
# or using regular expression
soup.find_all("a", class_=re.compile(""))