How to find all links using BeautifulSoup and Python?

BeautifulSoup is a popular HTML parsing library used in web scraping with Python. With BeautifulSoup, to find all links on the page we can use the find_all() method or CSS selectors and the select() method:

import bs4
soup = bs4.BeautifulSoup("""
<a href="/pricing">Pricing</a>
<a href="https://example.com/blog">Blog</a>
<a href="https://twitter.com/@company">Twitter</a>
""")
links = [node.get('href') for node in soup.find_all("a")]
[
    "/pricing",   
    "https://example.com/blog",
    "https://twitter.com/@company",
]
# or with css selectors:
link = [node.get('href') for node in soup.select('a')]

It should be noted that bs4 extracts links as they appear on the page. Links can be:

Relative to the current website like /pricing
Absolute like like https://example.com/blog
Absolute outbound like https://twitter.com/@company

We can convert all relative urls to absolute using urllib.parse.urljoin function:

from urllib.parse import urljoin

base_url = "https://example.com"
links = [urljoin(base_url, link) for link in links]
print(links)
# will print
"https://example.com/pricing"
"https://example.com/blog"
"https://twitter.com/@company"

We can also filter out outbound URLs if we want to restrict our scraper to a particular website. For this https://pypi.org/project/tldextract/ library can be used to find the top level domain (TLD):

import tldextract

allowed_domain = "example.com"
for link in links:
    tld = tldextract.extract("link").registered_domain
    if tld != allowed_domain:
        continue
    else:
        print(link)
# will print
"https://example.com/pricing"
"https://example.com/blog"
# notice the twitter url is missing

Provided by Scrapfly

This knowledgebase is provided by Scrapfly data APIs, check us out! 👇

Web Scraping API - scrape without blocking, control cloud browsers, and more.
Extraction API - AI and LLM for parsing data.
Screenshot API - capture pages or elements with no blocks.

Try ScrapFly for FREE!

How to find all links using BeautifulSoup and Python?

Provided by Scrapfly

Company

Tools

Resources

Learn Web Scraping

Usage

How to find all links using BeautifulSoup and Python?

Provided by Scrapfly

Related Questions

Related Posts

How to Parse Web Data with Python and Beautifulsoup

Company

Tools

Resources

Learn Web Scraping

Usage