🚀 We are hiring! See open positions

How to turn HTML to text in Python?

by scrapecrow Oct 31, 2022

When web scraping, we might need to represent scrape HTML data as plain text. For this we can use BeautifulSoup's get_text() method which extracts all visible HTML text and most importantly ignores invisible details such as <script> elements:

from bs4 import BeautifulSoup

soup = BeautifulSoup("""
<body>
    <article>
    <h1>Article title</h1>
    <p>first paragraph and a <a>link</a></p>
    <script>var invisible="javascript variable";</script>
    </article>
</body>
""")
# if possible it's best to restrict html to a specific element
element = soup.find('article')
text = element.get_text()
print(text)
"""
Article title
first paragraph and a link
"""

Related Articles

How to Parse Web Data with Python and Beautifulsoup

Beautifulsoup is one the most popular libraries in web scraping. In this tutorial, we'll take a hand-on overview of how to use it, what is it good for and explore a real -life web scraping example.

BEAUTIFULSOUP
DATA-PARSING
PYTHON
How to Parse Web Data with Python and Beautifulsoup

How to Scrape Imovelweb.com

Scrape Imovelweb with Python - extract listings and details, handle pagination and JSON-LD, and use Scrapfly for anti-bot reliability.

PYTHON
SCRAPEGUIDE
BEAUTIFULSOUP
REQUESTS
SCRAPFLY
How to Scrape Imovelweb.com

How to Scrape AutoScout24

Learn how to scrape AutoScout24 for car listings, prices, specifications, and detailed vehicle information using Python. Complete guide with code examples and anti-blocking techniques.

PYTHON
SCRAPEGUIDE
BEAUTIFULSOUP
REQUESTS
How to Scrape AutoScout24

How to Scrape Allegro.pl

Learn how to scrape Allegro.pl for product listings and individual product details using Python with requests and BeautifulSoup4

PYTHON
SCRAPEGUIDE
BEAUTIFULSOUP
REQUESTS
How to Scrape Allegro.pl

How to Scrape Ticketmaster

Learn how to scrape Ticketmaster for event data including concerts, venues, dates, and ticket information using Python. Complete guide with code examples and anti-blocking techniques.

PYTHON
SCRAPEGUIDE
BEAUTIFULSOUP
REQUESTS
How to Scrape Ticketmaster

How to Scrape Mouser.com

Learn how to scrape Mouser.com electronic component data including prices, specifications, and inventory using Python. Complete guide with code examples and anti-blocking techniques.

PYTHON
SCRAPEGUIDE
BEAUTIFULSOUP
REQUESTS
How to Scrape Mouser.com