Web Scraping Emails using Python
In this tutorial we'll take a look at email scraping. How to crawl pages and extract email addresses using Python and what are some popular challenges.
When scraping we might notice that some page elements are only visible in the web browser but not in our scraper. This is called dynamic javascript data and it's being created by javascript on page load. If our scraper is not running a full browser to execut javascript it'll never see dynamic elements rendered.
There are many ways to scrape dynamic data like using web browsers:
See our introduction tutorial article to scraping using web browsers and automation toolkits like Puppeteer, Selenium and Playwright
Alternatively, sometimes dynamic data is already present in the HTML document but in a different location than what we see in the browser. Most commonly the data is hidden in <script>
elements as javascript variables and then unpacked into the HTML on page load.
For more see this introduction article which covers how to find hidden web data and popular hidden web data scenarios