How to Parse XML
In this article, we'll explain about XML parsing. We'll start by defining XML files, their format and how to navigate them for data extraction.
When web scraping, the most common way to navigate HTML data is to find elements by class name. For that, we can use CSS or XPath Selectors:
.class
notation, which will find any nodes that contain full class name:`.some-class`
will match:
`<a class="some-class"></a>`
`<a class="first some-class third"></a>`
[class*="<partial>"]
notation which will find any nodes that contain a given string:`[class*="some-class"]`
will match:
`<a class="some-class"></a>`
`<a class="first some-class third"></a>`
Alternatively, XPath selectors can be used with similar functions:
//*[@class="link"]
will find any element where the class is exactly equal to "link"
//*[contains(@class, "link")]
will find any element where class contains the string "link"
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇