How to parse dynamic CSS classes when web scraping?

Dynamic class names are becoming increasingly common in modern web landscape which can be a tough challenge in web scraping. Let's take a look at this dynamic class example and how can we parse it:

<div class="pdd fg-black">
    <h2>Product Details</h2>
    <div class="fqv b1">
        <div class="fz g1">Price</div>
        <div class="g2 cvx">22.55</div>
    </div>
</div>

Usually, we'd see some human-like class names that we can rely on using CSS Selectors, however in this case the class names look non-sensical which means these classes are most likely dynamic. Dynamic classes can change at any moment which would break our scraper.

The best way to deal with this issue is to use text-based XPath parsing. In our example above to select the price we can find HTML elements by text and relative relationship. See this interactive example:

Product Details

Price
22.55

In this example, we select an element that has the text of Price and then select the first following sibling for the price value. With this approach even if the class names will change our parser will continue to extract data successfully!

For more on text-based parsing see:

Question tagged: Data Parsing

Related Posts

Intro to Parsing HTML and XML with Python and lxml

In this tutorial, we'll take a deep dive into lxml, a powerful Python library that allows for parsing HTML and XML effectively. We'll start by explaining what lxml is, how to install it and using lxml for parsing HTML and XML files. Finally, we'll go over a practical web scraping with lxml.

How to Parse XML

In this article, we'll explain about XML parsing. We'll start by defining XML files, their format and how to navigate them for data extraction.

Web Scraping to Google Sheets

Google sheets is an easy to store scraped data. In this tutorial we'll take a look at how to use this free online database for storing scraped data!