Hidden API Scraping

When it comes to dynamic pages the most common pattern is to use javascript background requests to fetch data from an API and then render it on the page.

Our web-scraping.dev/testimonials example does exactly that!

When the user scrolls to the bottom of the page a javascript trigger is fired that makes a background request to a hidden API. Then, the received results are used to update the visible page HTML.

python icon
How to Scrape Hidden APIs

For this example see our beginner-friendly in-depth blog which starts off with devtools network inspector intro and walks through the /testimonials example page.

Why scrape hidden APIs?

Hidden APIs are great for web scraping as it's direct access to the website's public data. This means that we have the benefits of:

  • Speed - No need to render the page, execute javascript, and wait for the page to load.
  • Stability - No need to worry about the website's layout changes.
  • Consistency - The data is always in the same format.
  • Complexity - It can be difficult to reverse engineer hidden APIs that are obfuscated or protected.
  • Change - Hidden APIs are not documented and can change at any time.

Example Exercises

By far the most common dynamic data fetching pattern is the "endless paging" pattern which is encountered in many web scraping projects. Take a look at this example exercise of how to scrape endless pagination below 👇

Scrapeground Exercise: Endless Paging Scraping

See this in-depth tutorial on Scrapfly Scrapeground full more detailed example of this scraper. This includes examples in different languages and libraries as well as more scenarios.

Next up - Headless Browsers

Reverse engineering hidden APIs can be a difficult process so why not a real web browser for web scraping instead? Let's take a look at that next.

< >

Summary