Ultimate Guide to JSON Parsing in Python
Learn JSON parsing in Python with this ultimate guide. Explore basic and advanced techniques using json, and tools like ijson and nested-lookup
When web scraping, we might need to represent scrape HTML data as plain text. For this we can use BeautifulSoup's get_text()
method which extracts all visible HTML text and most importantly ignores invisible details such as <script>
elements:
from bs4 import BeautifulSoup
soup = BeautifulSoup("""
<body>
<article>
<h1>Article title</h1>
<p>first paragraph and a <a>link</a></p>
<script>var invisible="javascript variable";</script>
</article>
</body>
""")
# if possible it's best to restrict html to a specific element
element = soup.find('article')
text = element.get_text()
print(text)
"""
Article title
first paragraph and a link
"""
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇