Dynamic Page Scraping

Dynamic Pages are very different from classic static HTML pages as the page rendering is done client-side, by the web browser. This becomes an issue in web scraping as web scrapers are not web browsers, well, not usually.

Dynamic Page Example

An example of a dynamic page would be web-scraping.dev/testimonials where more testimonials are being loaded as the user scrolls down the page. 👉

Some other indicators that a page is dynamic:

  • Contains interactive charts or graphs.
  • Loads data on demand, noticeable hen load animation or spinners are used.
  • Page does not load with javascript disabled.

So, how can these elements be scraped?

What are the options?

In practice, web scrapers have two very different approaches:

Running a whole web browser is expensive, but reverse engineering can be difficult, time-consuming, and brittle. So, which one to choose?

In the next few sections, we'll take a look at both approaches and how to determine which one to choose as well as some awesome tricks that can make dynamic page scraping a breeze.

Next up - Hidden API Scraping

We'll start by taking a look at the most common dynamic data source - hidden APIs that fetch dynamic data on demand through background requests and how we can scrape this.

< >

Summary