Dynamic Page Scraping
Dynamic Pages are very different from classic static HTML pages as the page rendering is done client-side, by the web browser. This becomes an issue in web scraping as web scrapers are not web browsers, well, not usually.
Dynamic Page Example
An example of a dynamic page would be web-scraping.dev/testimonials where more testimonials are being loaded as the user scrolls down the page. 👉
Some other indicators that a page is dynamic:
- Contains interactive charts or graphs.
- Loads data on demand, noticeable hen load animation or spinners are used.
- Page does not load with javascript disabled.
So, how can these elements be scraped?
What are the options?
In practice, web scrapers have two very different approaches:
Use Real Web Browsers
It is possible to run a real web browser like Chrome or Firefox for scraping which can perform browser functions like clicking buttons.
Reverse Engineer and Replicate
We can inspect and see exactly how web client works using browser developer tools and replicate it in our scraper.
Running a whole web browser is expensive, but reverse engineering can be difficult, time-consuming, and brittle. So, which one to choose?
In the next few sections, we'll take a look at both approaches and how to determine which one to choose as well as some awesome tricks that can make dynamic page scraping a breeze.
Next up - Hidden API Scraping
We'll start by taking a look at the most common dynamic data source - hidden APIs that fetch dynamic data on demand through background requests and how we can scrape this.