XPath vs CSS selectors: what's the difference?

XPath and CSS selectors are both used to parse HTML when web scraping. Both of these are small path languages that perform the same function. So, what's the difference?

Generally, CSS selectors are more brief and popular as it's the same language used to apply styles in web development. However, XPath selectors are more powerful though a bit more complex.

In particular, XPath offers these 3 advantages over CSS selectors:

  • Traverse up the HTML tree, i.e. select element parent nodes.
  • Find elements by text value.
  • Easily define custom functions and more built-in functions like regular expression matching.

When web scraping, it's best to mix both to take advantage of the strengths of these two different tools. For example, let's take a look at this example page and how CSS and XPath selectors can be used to their strengths:

<div class="product">
  <div class="price">
    <div data-price="22.84">$22.84</div>
  </div>
  <div>
    <div>Company Name inc.</div>
    <div>
      <div>website: <a href="http://example.com">example.com</a></div>
    </div>
  </div>
</div>

To extract the price we can use a simple CSS selector:

.product>.price::attr(data-price)

However, CSS selectors cannot find elements by their text value or navigate up the HTML tree. So, to select Company Name inc. we'd have better luck with XPath:

//div[contains(text(),'website:')]/../../div[1]/text()

In the example above we're finding a div element that contains text website:, then select its grandparent and the first div child which is the name of the company!

To summarize XPath vs CSS selectors - both are great tools for parsing HTML though CSS selectors are briefer and easier to use while XPath is more powerful but more verbose and complex. Luckily, most programming languages support both and when web scraping we're best to mix these two technologies!

Question tagged: XPath, Css Selectors

Related Posts

How to Parse XML

In this article, we'll explain about XML parsing. We'll start by defining XML files, their format and how to navigate them for data extraction.

Ultimate XPath Cheatsheet for HTML Parsing in Web Scraping

Ultimate companion for HTML parsing using XPath selectors. This cheatsheet contains all syntax explanations with interactive examples.

Web Scraping With Ruby

Introduction to web scraping with Ruby. How to handle http connections, parse html files for data, best practices, tips and an example project.