How to Parse XML
In this article, we'll explain about XML parsing. We'll start by defining XML files, their format and how to navigate them for data extraction.
To select elements by class using XPath we can match the @class
attribute using contains()
function or the =
operator.
For example, to select <a class="link"></a>
we could use //a[@class="link"]
or //a[contains(@class, "link")]
selectors. See this interactive example:
Note that using contains()
might match partial matches. For example, disabled-link
would be matched by our contains(@class, "link")
selector.
To match by a single class we can use contains(concat(" ", normalize-space(@class), " "), " match ")
pattern:
Tip: If you're using Python's parsel
package then there's an equivalent shortcut has-class()
. For example, //a[has-class("link")]
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇