The XPath syntax allows interaction with web elements' attributes, such as class
, id
, href
, and others, through the @
XPath expression. This enables us to select any element in the web page DOM based on its attribute values with XPath selectors.
To select elements based on their attribute values, we can follow either of two XPath expressions:
- Select the desired element by its exact attribute value, such as
[@attribute='value']
- Select by partial attribute value using XPath contains, such as
[contains(@attribute, 'value')]
Let's go over practical examples of applying the above XPath queries to extract data on both HTML and XML documents.
Using XPath @ expression
The @
XPath expression selects a web element using the exact element's attribute:
Here, we use the XPath query to select all elements containing the href
tag.
Using XPath contains expression
The contains
XPath function enables selecting a particular web element based on its partial text match:
Above, we select all href
attributes by using a partial text search. For further details on XPath selectors, refer to our dedicated guide.
Parsing HTML with Xpath
Introduction to xpath in the context of web-scraping. How to extract data from HTML documents using xpath, best practices and available tools.
XML
Note that the mentioned XPath functions can also be applied with the XML path language. Here's an example of parsing an XML document:
We can also use the XPath contains
expression to text by partial attribute text:
For more on parsing XML with XPath and CSS selectors, refer to our dedicated guide.
How to Parse XML
In this article, we'll explain about XML parsing. We'll start by defining XML files, their format and how to navigate them for data extraction.