How to turn HTML to text in Python?

When web scraping, we might need to represent scrape HTML data as plain text. For this we can use BeautifulSoup's get_text() method which extracts all visible HTML text and most importantly ignores invisible details such as <script> elements:

from bs4 import BeautifulSoup

soup = BeautifulSoup("""
<body>
    <article>
    <h1>Article title</h1>
    <p>first paragraph and a <a>link</a></p>
    <script>var invisible="javascript variable";</script>
    </article>
</body>
""")
# if possible it's best to restrict html to a specific element
element = soup.find('article')
text = element.get_text()
print(text)
"""
Article title
first paragraph and a link
"""

Provided by Scrapfly

This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇