Guide to List Crawling: Everything You Need to Know
In-depth look at list crawling - how to extract valuable data from list-formatted content like tables, listicles and paginated pages.
Using Python and Beautifulsoup, to find elements without a specific attribute (like class
) we can use find
or find_all
methods or CSS selectors:
import bs4
soup = bs4.BeautifulSoup("""
<a class="ignore">bad link</a>
<a>good link</a>
""")
soup.find_all("a", class_=None)
["<a>good link</a>]
# or using a lambda function:
soup.find_all("a", class_=lambda value: "ignore" not in value)
# or using regular expression
soup.find_all("a", class_=re.compile(""))
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇