How to use XPath selectors in Python?

The most popular package that implements XPath selectors in Python is lxml. We can use the xpath() method to find all matching values:

from lxml import etree

tree = etree.fromstring("""
<div>
    <a>link 1</a>
    <a>link 2</a>
</div>
""")
for result in tree.xpath("//a"):
    print(result.text)
"link 1"
"link 2"

However, in web scraping the recommended way is to use the parsel package. It's based on lxml and providers a more consistent behavior when working with HTML content:

from parsel import Selector

selector = Selector("""
<div>
    <a>link 1</a>
    <a>link 2</a>
</div>
""")

selector.xpath("//a").getall()
['<a>link 1</a>', '<a>link 2</a>']
Question tagged: XPath, Python, Data Parsing

Related Posts

Ultimate XPath Cheatsheet for HTML Parsing in Web Scraping

Ultimate companion for HTML parsing using XPath selectors. This cheatsheet contains all syntax explanations with interactive examples.

Web Scraping With Ruby

Introduction to web scraping with Ruby. How to handle http connections, parse html files for data, best practices, tips and an example project.

Parsing HTML with Xpath

Introduction to xpath in the context of web-scraping. How to extract data from HTML documents using xpath, best practices and available tools.