     [Blog](https://scrapfly.io/blog)   /  [data-parsing](https://scrapfly.io/blog/tag/data-parsing)   /  [Ultimate XPath Cheatsheet for HTML Parsing in Web Scraping](https://scrapfly.io/blog/posts/xpath-cheatsheet)   # Ultimate XPath Cheatsheet for HTML Parsing in Web Scraping

 by [Bernardas Alisauskas](https://scrapfly.io/blog/author/bernardas) Apr 18, 2026 11 min read [\#data-parsing](https://scrapfly.io/blog/tag/data-parsing) [\#xpath](https://scrapfly.io/blog/tag/xpath) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fxpath-cheatsheet "Share on LinkedIn")    

 

 

   

XPath is the most powerful HTML parsing engine used in HTML parsing when web scraping. It's available in [Scrapy](https://pypi.org/project/Scrapy/) and popular HTML parsing libraries like [parsel](https://pypi.org/project/parsel/).

XPath comes in version 1.0, 2.0 and 3.0. Most commonly XPath 1.0 and 2.0 are supported in web scraping toolsets and most functionality can be found in 1.0. In this cheatsheet we'll take a look at XPath 1.0 and 2.0 features that are relevant to HTML parsing.

[Parsing HTML with XpathIntroduction to xpath in the context of web-scraping. How to extract data from HTML documents using xpath, best practices and available tools.](https://scrapfly.io/blog/posts/parsing-html-with-xpath)

This XPath cheatsheet contains all XPath features used in HTML parsing. Clicking on the explanation text will take you to a real-life interactive example with more details. Note that XPath selectors can differ in different implementations, so unique non-standard features are marked as such.

## Key Takeaways

Master XPath selectors for web scraping by learning navigation axes, string functions, position functions, and logical operators to precisely extract data from HTML documents using the most powerful HTML parsing engine available.

- XPath is the most powerful HTML parsing engine for web scraping, available in Scrapy, parsel, and other popular libraries
- XPath 1.0 and 2.0 are the most commonly supported versions in web scraping toolsets
- Navigation axes like //, .., ancestor::, following:: allow precise element selection beyond simple CSS selectors
- String functions like contains(), starts-with(), ends-with(), and matches() enable sophisticated text-based filtering
- Position functions like position(), last(), and count() provide powerful ways to select elements by their location
- Logical operators and, or, and not() allow complex conditional selection rules
- Attribute selection with @attribute syntax provides direct access to HTML attributes
- Advanced features include namespace handling, conditional expressions, and mathematical operations

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.







[🗋⮭](#cheatsheet "go to cheatsheet")## Cheatsheet

| Selector | Explanation |
|---|---|
|  | **Axes and Navigation** |
| element | [selects node by name: `div`, `a`](#by-element-name) |
| \* | [selects any node (wildcard)](#element-name-wildcard) |
| . or self | [selects current node (useful in matching)](#element-self) |
| \[n\] | [select n-th node (starts at 1)](#nth-element) |
| / | [selects direct child](#direct-child-element) |
| // | [selects any descendant](#any-descendant-element) |
| ./ or .// | [explicit relativity](#explicit-relativity) |
| .. | [selects element parent](#element-parent) |
| ancestor:: | [selects element's ancestors (parent, grandparent)](#element-ancestors) |
| preceding:: | [selects all preceding nodes (above)](#preceding-nodes) |
| following:: | [selects all following nodes (below)](#following-nodes) |
| preceding-sibling:: | [selects preceding siblings (above)](#preceding-siblings) |
| following-sibling:: | [selects following siblings (below)](#following-siblings) |
|  | **Logical Operators** |
| \| | [union logic - joins multiple selectors in order as they appear](#union-logic) |
| `or` | [chain multiple optional predicates](#or-logic) |
| `and` | [chaing multiple predicates](#and-logic) |
|  | **Attribute Matching** |
| @attribute | [selects attribute by name](#attribute-by-name) |
| text() | [retrieve element's text](#element-text) |
| \[\] | [element predicate (rules to match nodes)](#element-predicate) |
| \[\]\[\] | [multiple element predicates](#element-predicate) |
| \[node\[\]\] | [nested element predicates](#element-predicate) |
| \[a=b\] | [exact match](#element-predicate) |
| \[a&gt;b\], \[a&lt;b\] | [number values can be matched for greater/less](#element-predicate) |
|  | **Functions** |
| name() | [return current node's name](#get-element-name) |
| not(A) | [reverses A](#not-function) |
| number(A) | [cast value to a number](#number-function) |
| contains(A, B) | [check whether A contains B](#contains-function) |
| matches(pattern, value) or re:test(pattern, value) | [check for regular expressions pattern xpath2](#matches-function) |
| tokenize(value, pattern) | [split string by regex pattern xpath2](#tokenize-function) |
| lower-case(A) | [turns value lowercase xpath2](#lower-case-function) |
| starts-with(A, B) | [check whether A starts with B](#starts-with-function) |
| ends-with(A, B) | [check whether A ends with B](#ends-with-function) |
| concat(\*args) | [join multiple values to a single string](#concat-function) |
| substring(str, start, len) | [split string into multiple elements](#substring-function) |
| substring-before(str, separator) | [split string and take the beginning value](#substring-before-function) |
| substring-after(str, separator) | [split string and take the end value](#substring-after-function) |
| normalize-space(A) | [normalize space character for A](#normalize-space) |
| count(A) | [count matched elements](#count-function) |
| position() | [node's position within surrounding siblings](#position-function) |
| last() | [context size: `[last()]` selects last node](#last-function) |
| string-length(A) | [returns string length](#string-length-function) |
|  | **Common Patterns** |
| //a/@href | [all links on the page](#all-page-links) |
| //img/@src | [all images on the page](#all-image-links) |
| //text() | [all text under node](#all-text) |
| //child::\*/text() | [all direct children text](#all-direct-children-text) |
| \[contains(concat(' ',normalize-space(@class),' '),' myclass ')\] | [space separated class check (like css' `.class`)](#space-separated-class-check) |
| \[preceding::div\[.="One"\] and following::div\[.="Two"\]\] | [node between two nodes](#node-between-two-nodes) |
|  | **Non-standard Functions** |
| re:test(x, expr, flags) | [like matches() but used in scrapy, lxml and parsel](#matches-function) |

## by Element Name

&lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;blog&lt;/a&gt; The most simple selector is the element name. It selects all nodes with the given name.

## Element Name Wildcard

&lt;div&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;b href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/b&gt; &lt;c href="https://scrapfly.io/blog"&gt;blog&lt;/c&gt; &lt;/div&gt; When element name is `*`, it matches any node. This can be further refined using `self()` and attribute matching.

## Element Self

&lt;div&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;blog&lt;/a&gt; &lt;/div&gt; `self` or `.` syntax can be used to access the current context. This is useful for advanced matching element in `[]` predicates.

## Nth Element

&lt;div&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;blog&lt;/a&gt; &lt;/div&gt; XPath supports element indexing. `a[1]` will select the first `a` element, `a[2]` will select the second, and so on. Note that XPath indexing starts at 1, not 0.

## Direct Child Element

&lt;div&gt; &lt;span&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;/span&gt; &lt;/div&gt; The `/` selector selects direct children of the current node. `a` will select all `a` elements, while `div/a` will select only `a` elements that are direct children of a `div` node.

## Any Descendant Element

&lt;div&gt; &lt;span&gt; &lt;a&gt;select&lt;/a&gt; &lt;span&gt; &lt;a&gt;select&lt;/a&gt; &lt;/span&gt; &lt;/span&gt; &lt;a&gt;select&lt;/a&gt; &lt;/div&gt; The `//` selects any descendant under current context. This is one of the most useful selectors in XPath, as it allows you to select any element recursively.

## Explicit Relativity

`./` and `.//` are explicit versions of `/` and `//`. They are useful when you want to be explicit about the relativity of the selector. This is mostly relevant for engines that allow working with multiple selectors like `parsel` :

python```python
from parsel import Selector

sel = Selector()
product = sel.xpath('//div[@class="product"]')
product_features = product.xpath('.//div[@class="features"]')
#                                 ^^^
# without . it would select all features in the document
```



## Element Parent

&lt;div&gt; &lt;a href="important-link"&gt; &lt;article&gt;some link&lt;/article&gt; &lt;/a&gt; &lt;/div&gt; The `..` or `parent` selector can be used to navigate up the HTML node tree or match by parent values.

## Element Ancestors

&lt;div&gt; &lt;article&gt; &lt;a&gt;important link1&lt;/a&gt; &lt;a&gt;important link2&lt;/a&gt; &lt;span&gt; &lt;button&gt;subscribe&lt;/button&gt; &lt;span&gt; &lt;/article&gt; &lt;article&gt; ignore &lt;/article&gt; &lt;/div&gt; The `ancestor::` selector is similar to `..` but selects any ancestor no matter the depth level. Note the axes selector `::` can be used with `*` wildcard to select any ancestor.

## Preceding Nodes

&lt;div&gt; &lt;div&gt; &lt;a href="important"&gt;select&lt;/a&gt; &lt;/div&gt; &lt;a href="important2"&gt;select&lt;/a&gt; &lt;separator&gt;&lt;/separator&gt; &lt;a href="ignore"&gt;ignore&lt;/a&gt; &lt;/div&gt; The `preceding` selector selects all nodes that appear before the current node. This is useful for selecting all nodes above the current node.

## Following Nodes

&lt;div&gt; &lt;a href="ignore"&gt;ignore&lt;/a&gt; &lt;separator&gt;&lt;/separator&gt; &lt;div&gt; &lt;a href="important"&gt;select&lt;/a&gt; &lt;/div&gt; &lt;a href="important2"&gt;select&lt;/a&gt; &lt;/div&gt; The `following` selector selects all nodes that appear after the current node. This is useful for selecting all nodes below the current node.

## Preceding Siblings

&lt;div&gt; &lt;div&gt; &lt;a href="ignore"&gt;ignore&lt;/a&gt; &lt;/div&gt; &lt;a href="important"&gt;select&lt;/a&gt; &lt;separator&gt;&lt;/separator&gt; &lt;a href="ignore"&gt;ignore&lt;/a&gt; &lt;/div&gt; The `preceding-sibling` selector selects all nodes that appear before the current node and share the same parent. This is useful for selecting all sibling nodes above the current node.

## Following Siblings

&lt;div&gt; &lt;a href="ignore"&gt;ignore&lt;/a&gt; &lt;separator&gt;&lt;/separator&gt; &lt;div&gt; &lt;a href="ignore"&gt;ignore&lt;/a&gt; &lt;/div&gt; &lt;a href="important"&gt;select&lt;/a&gt; The `following-sibling` selector selects all nodes that appear after the current node and share the same parent. This is useful for selecting all sibling nodes below the current node.

## Union Logic

&lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;b href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/b&gt; &lt;c href="https://scrapfly.io/blog"&gt;blog&lt;/c&gt; XPath supports multiple selectors in a single query. The `|` operator is used to join multiple selectors in order as they appear.

## Or Logic

&lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;blog&lt;/a&gt; XPath supports logic for multiple matchers using the `or` keyword which can be used to join multiple match rules.

## And Logic

&lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;a href="https://twitter.com/"&gt;Powered by Twitter&lt;/a&gt; XPath supports logic for multiple matchers using the `and` keyword which can be used to join multiple match rules. Alternatively, multiple predicates can be chained in `[]` as `[contains(@href, "twitter")][contains(@href, "@")]`.

## Attribute by Name

&lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;blog&lt;/a&gt; Using `@attribute` syntax any HTML attribute can be selected.

## Element Text

&lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;blog&lt;/a&gt; The element's text value can be selected using `text()` function. Most commonly `//text()` is used to select all text values under a node and this is the recommended selector in web scraping as it's more robust to HTML changes:

&lt;div&gt; &lt;a&gt; &lt;span&gt;Price&lt;/span&gt; &lt;div&gt;14.88&lt;/div&gt; USD &lt;/a&gt; &lt;/div&gt; ## Element Predicate

&lt;a class="social"&gt;Mastodon&lt;/a&gt; &lt;a class="link"&gt;Blog&lt;/a&gt; &lt;a class="ad"&gt;Read my book&lt;/a&gt; All node selectors can have match predicates that add selection rules. The predicates can be nested and chained to narrow down the selection.

Element predicates support basic arithmetic and comparison operators like `>` and `<` for greater/less than, `=` for exact match, and `!=` for not equal:

&lt;!-- select all "cheap products" --&gt; &lt;a data-price=1&gt;addon product 1&lt;/a&gt; &lt;a data-price=10&gt;cheap product 1&lt;/a&gt; &lt;a data-price=200&gt;expensive product&lt;/a&gt; &lt;a data-price=15&gt;cheap product 2&lt;/a&gt; ## Get Element Name

&lt;div&gt; &lt;a&gt;ignore&lt;/a&gt; &lt;h1&gt;heading 1&lt;/h1&gt; &lt;h2&gt;heading 2&lt;/h2&gt; &lt;h3&gt;heading 3&lt;/h3&gt; &lt;a&gt;ignore&lt;/a&gt; &lt;/div&gt; The `name()` function can be used to retrieve the element's node name to be used in predicate matchers. For example, above we select select all nodes that start with `h`.

## Not Function

&lt;div&gt; &lt;a&gt;select&lt;/a&gt; &lt;a class="foo"&gt;ignore&lt;/a&gt; &lt;a&gt;select&lt;/a&gt; &lt;/div&gt; The `not()` function can reverse any value as a boolean. It's useful for excluding certain elements from the selection.

## Number Function

&lt;div&gt; &lt;div class="product" data-price="$22.00"&gt; &lt;span&gt;Product A&lt;/span&gt; &lt;/div&gt; &lt;div data-price="$88.99"&gt; &lt;span&gt;Product B&lt;/span&gt; &lt;/div&gt; &lt;div data-price="$15.00"&gt; &lt;span&gt;Product C&lt;/span&gt; &lt;/div&gt; &lt;/div&gt; The `number()` function can cast any string numbers to a number. It's mostly used in combination of string processing functions like `substring`.

## Contains Function

&lt;a href="https://twitter.com/@scrapfly\_dev"&gt;social: Twitter&lt;/a&gt; &lt;a href="https://x.com/@scrapfly\_dev"&gt;social: X.com&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;newsletter: scrapfly.io/blog&lt;/a&gt; The `contains(A, B)` function can check whether B is part of A. Note that `contains()` is case sensitive

## Matches Function

 &lt;a&gt;www.select.com&lt;/a&gt; &lt;a&gt;www.select-also.com&lt;/a&gt; &lt;a&gt;ignore.com&lt;/a&gt; &lt;a&gt;www.ignore.net&lt;/a&gt; The `matches()` function is a powerful regular expression function that can check whether a string matches a pattern. It takes 3 arguments: the string, the pattern, and optional flags where flags can be:

- i: Case-insensitive matching
- s: Dot matches all (affects the dot . metacharacter to match newlines as well)
- m: Multi-line matching (affects how ^ and $ behave)
- x: Extended (ignores whitespace within the pattern)

Note that tools like `scrapy`, `parsel` and `lxml` while don't fully support XPath 2.0 have implemented this function as `re:test`.

## Tokenize Function

 &lt;!-- select paragraphs with more than 3 words --&gt; &lt;p&gt;one two&lt;/p&gt; &lt;p&gt;one two three four five six seven&lt;/p&gt; &lt;p&gt;one&lt;/p&gt; The `tokenize()` function can split a string by a given pattern. It's useful for counting words or splitting complex string values.

## Lower-Case Function

&lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Social: Twitter&lt;/a&gt; &lt;a href="https://x.com/@scrapfly\_dev"&gt;SOCIAL: X.com&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;newsletter: scrapfly.io/blog&lt;/a&gt; The `lower-case(A)` function can turn any string value to lowercase. It's useful for case-insensitive matching in combination with [contains](#contains-function).
Note `lower-case` is only available in **XPath 2.0**

## Starts-With Function

&lt;div&gt; &lt;p &gt; Follow us on: &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;social: Twitter&lt;/a&gt; &lt;a href="https://x.com/@scrapfly\_dev"&gt;social: X.com&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;newsletter: scrapfly.io/blog&lt;/a&gt; &lt;/p&gt; &lt;/div&gt; `start-with(a, b)` checks whether `a` starts with `b`. Like `contains`, it's a string matching function, but it's more specific to match only the beginning.

## Ends-With Function

&lt;div&gt; &lt;p &gt; Follow us on: &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;News on Twitter&lt;/a&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Support on Twitter&lt;/a&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;News on X.com&lt;/a&gt; &lt;/p&gt; &lt;/div&gt; `ends-with(a, b)` checks whether `a` ends with `b`. Like `contains`, it's a string matching function, but it's more specific to match only the end.

## Concat Function

&lt;div&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Twitter&lt;/a&gt; &lt;a href="https://www.linkedin.com/company/scrapfly/"&gt;LinkedIn&lt;/a&gt; &lt;/div&gt; The `concat(a,b,c...)` is a utility function that can join multiple elements into a single string.

## Substring Function

&lt;div&gt; &lt;a&gt;+99 12345678 Call Us&lt;/a&gt; &lt;a&gt;+87 87654321 Text Us&lt;/a&gt; &lt;/div&gt; The `substring()` function can slice a string by given indexes. It takes 3 arguments: the string, slice start and slice length.

## Substring-Before Function

&lt;div&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;News on Twitter&lt;/a&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Support on Twitter&lt;/a&gt; &lt;a href="https://x.com/@scrapfly\_dev"&gt;Industry Insights on X.com&lt;/a&gt; &lt;/div&gt; The `substring-before()` function can split a string into multiple elements. It takes 2 arguments: the string and the separator. It returns the part of the string before the separator.

## Substring-After Function

&lt;div&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;News on Twitter&lt;/a&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;Support on Linkedin&lt;/a&gt; &lt;a href="https://x.com/@scrapfly\_dev"&gt;Industry Insights on X.com&lt;/a&gt; &lt;/div&gt; The `substring-after()` function can split a string into multiple elements. It takes 2 arguments: the string and the separator. It returns the part of the string after the separator.

## Normalize-Space

&lt;div&gt; &lt;a class=" foo bar "&gt;select&lt;/a&gt; &lt;/div&gt; The `normalize-space()` function can normalize space characters in a string. It's vital when matching values with spaces as HTML found online can be inconsistent with space use.

## Count Function

&lt;div&gt; &lt;a href="https://twitter.com/@scrapfly\_dev"&gt;social: Twitter&lt;/a&gt; &lt;a href="https://x.com/@scrapfly\_dev"&gt;social: X.com&lt;/a&gt; &lt;a href="https://scrapfly.io/blog"&gt;newsletter: scrapfly.io/blog&lt;/a&gt; &lt;/div&gt; The `count()` function can return the count of matched elements. It's useful for checking whether a selector matches any elements.

## Position Function

&lt;div&gt; &lt;a&gt;link 1&lt;/a&gt; &lt;a&gt;link 2&lt;/a&gt; &lt;a&gt;link 3&lt;/a&gt; &lt;a&gt;link 4&lt;/a&gt; &lt;a&gt;link 5&lt;/a&gt; &lt;a&gt;link 6&lt;/a&gt; &lt;/div&gt; The `position()` function can return the node's position within surrounding siblings. Note that `[position()=1]` is equivalent to `[1]` however `position()` can be used in more complex matching within predicates.

## Last Function

&lt;div&gt; &lt;a&gt;link 1&lt;/a&gt; &lt;a&gt;link 2&lt;/a&gt; &lt;a&gt;link 3&lt;/a&gt; &lt;/div&gt; The `last()` function can return the context size. `[last()]` will select the last node.

## String-Length Function

&lt;article&gt; &lt;p &gt; Important paragraph with a lot of data &lt;/p&gt; &lt;p &gt; ad &lt;/p&gt; &lt;p &gt; Another important paragraph with a lot of data &lt;/p&gt; &lt;/article&gt; The `string-length(A)` returns the length of a string. It's useful for checking whether a string is empty or not.

## Space Separated Class Check

&lt;div&gt; &lt;a class="foo"&gt;select&lt;/a&gt; &lt;a class="afoo"&gt;ignore&lt;/a&gt; &lt;a class="bar foo"&gt;select&lt;/a&gt; &lt;a class="bar foo gaz"&gt;select&lt;/a&gt; &lt;a class="bar afoo gaz"&gt;ignore&lt;/a&gt; &lt;/div&gt; Using `contains(@class, "foo")` is not ideal way to match for class presence as it will match any class that contains `foo` like `xfooy`.

The complex selector above is equivalent to CSS selectors `.class` syntax which checks whether the whole space-separated class value is present.

## Node Between Two Nodes

&lt;div&gt; &lt;h2&gt;Section 1&lt;/h2&gt; &lt;p&gt;ignore&lt;/p&gt; &lt;h2&gt;Section 2&lt;/h2&gt; &lt;p&gt;select&lt;/p&gt; &lt;p&gt;select 2&lt;/p&gt; &lt;h2&gt;Section 3&lt;/h2&gt; &lt;p&gt;ignore&lt;/p&gt; &lt;/div&gt; The `preceding` and `following` selectors can be used to select nodes between two known elements. Alternatively, `preceding-sibling` and `following-sibling` can be used as well as a more strict selector.

There are many other creative ways to select element between two known elements. For example, using checking first preceding-sibling explicitly:

&lt;div&gt; &lt;h2&gt;Section 1&lt;/h2&gt; &lt;p&gt;ignore&lt;/p&gt; &lt;h2&gt;Section 2&lt;/h2&gt; &lt;p&gt;select&lt;/p&gt; &lt;p&gt;select 2&lt;/p&gt; &lt;h2&gt;Section 3&lt;/h2&gt; &lt;p&gt;ignore&lt;/p&gt; &lt;/div&gt; Or using `count()` to count preceding siblings if the text value is not reliable:

&lt;div&gt; &lt;h2&gt;Section 1&lt;/h2&gt; &lt;p&gt;ignore&lt;/p&gt; &lt;h2&gt;Section 2&lt;/h2&gt; &lt;p&gt;select&lt;/p&gt; &lt;p&gt;select 2&lt;/p&gt; &lt;h2&gt;Section 3&lt;/h2&gt; &lt;p&gt;ignore&lt;/p&gt; &lt;/div&gt; ## All Page Links

&lt;div&gt; &lt;a href="https://web-scraping.dev/product/1"&gt;product 1&lt;/a&gt; &lt;p&gt; &lt;a href="https://web-scraping.dev/product/2"&gt;product in paragraph&lt;/a&gt; &lt;/p&gt; &lt;/div&gt; To extract all links on the page the recursive `//a` selector with `@href` attribute selector.

Note that extracted links are often relative (like `/product/1`) and need to be resolved to absolute URLs using URL join tools like Python `urllib.urljoin()` function.

## All Image Links

&lt;div&gt; &lt;h2&gt;Images&lt;/h2&gt; &lt;img src="https://web-scraping.dev/assets/products/orange-chocolate-box-small-1.webp"/&gt; &lt;img src="https://web-scraping.dev/assets/products/orange-chocolate-box-small-2.webp"/&gt; &lt;p&gt; One more: &lt;img src="https://web-scraping.dev/assets/products/orange-chocolate-box-small-3.webp"/&gt; &lt;/p&gt; &lt;/div&gt; To extract all images on the page the recursive `//img` selector can be used with `@src` attribute selector.

Note that extracted links are often relative (like `/product/image-1.webp`) and need to be resolved to absolute URLs using URL joining tools like Python `urllib.urljoin()` function.

## All Text

&lt;article&gt; &lt;h2&gt;Should you buy Product&lt;/h2&gt; &lt;p&gt; This is a paragraph about &lt;a&gt;product 1&lt;/a&gt; &lt;/p&gt; &lt;ul&gt; &lt;li&gt;feature 1&lt;/li&gt; &lt;li&gt;feature 2&lt;/li&gt; &lt;li&gt;&lt;b&gt;bonus&lt;/b&gt; feature 3&lt;/li&gt; &lt;/ul&gt; &lt;/article&gt; The recursive `//text()` method will select all text values anywhere under the current node. Note that this often returns empty values as well so the output needs to be cleaned up manually.

## All Direct Children Text

&lt;article&gt; Product features &lt;a&gt;feature 1&lt;/a&gt;, &lt;i&gt;feature 2&lt;/i&gt;, &lt;b&gt;feature 3&lt;/b&gt;. &lt;div&gt; &lt;p&gt; avoids descendants &lt;/p&gt; &lt;/div&gt; &lt;/article&gt; Selecting only the text of direct children can be done by chaining the `child` selector in combination with `*` wildcard for node name. Unlike `/text()` this will *not* select the current nodes text itself but only the text of children.



## FAQ

Should I use XPath or CSS selectors for web scraping?It depends on your needs. XPath is more powerful for complex queries like selecting parent elements, matching by text content, and backward traversal, while [CSS selectors](https://scrapfly.io/blog/posts/css-selector-cheatsheet) are more concise and faster in browsers. Many scrapers use both depending on the task.









 

    Table of Contents- [Key Takeaways](#key-takeaways)
- [Cheatsheet](#cheatsheet)
- [by Element Name](#by-element-name)
- [Element Name Wildcard](#element-name-wildcard)
- [Element Self](#element-self)
- [Nth Element](#nth-element)
- [Direct Child Element](#direct-child-element)
- [Any Descendant Element](#any-descendant-element)
- [Explicit Relativity](#explicit-relativity)
- [Element Parent](#element-parent)
- [Element Ancestors](#element-ancestors)
- [Preceding Nodes](#preceding-nodes)
- [Following Nodes](#following-nodes)
- [Preceding Siblings](#preceding-siblings)
- [Following Siblings](#following-siblings)
- [Union Logic](#union-logic)
- [Or Logic](#or-logic)
- [And Logic](#and-logic)
- [Attribute by Name](#attribute-by-name)
- [Element Text](#element-text)
- [Element Predicate](#element-predicate)
- [Get Element Name](#get-element-name)
- [Not Function](#not-function)
- [Number Function](#number-function)
- [Contains Function](#contains-function)
- [Matches Function](#matches-function)
- [Tokenize Function](#tokenize-function)
- [Lower-Case Function](#lower-case-function)
- [Starts-With Function](#starts-with-function)
- [Ends-With Function](#ends-with-function)
- [Concat Function](#concat-function)
- [Substring Function](#substring-function)
- [Substring-Before Function](#substring-before-function)
- [Substring-After Function](#substring-after-function)
- [Normalize-Space](#normalize-space)
- [Count Function](#count-function)
- [Position Function](#position-function)
- [Last Function](#last-function)
- [String-Length Function](#string-length-function)
- [Space Separated Class Check](#space-separated-class-check)
- [Node Between Two Nodes](#node-between-two-nodes)
- [All Page Links](#all-page-links)
- [All Image Links](#all-image-links)
- [All Text](#all-text)
- [All Direct Children Text](#all-direct-children-text)
- [FAQ](#faq)
 
    Join the Newsletter  Get monthly web scraping insights 

 

  



Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 

## Explore this Article with AI

 [ ChatGPT ](https://chat.openai.com/?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fxpath-cheatsheet) [ Gemini ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fxpath-cheatsheet) [ Grok ](https://x.com/i/grok?text=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fxpath-cheatsheet) [ Perplexity ](https://www.perplexity.ai/search/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fxpath-cheatsheet) [ Claude ](https://claude.ai/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fxpath-cheatsheet) 



 ## Related Articles

 [  

 python data-parsing 

### Parsing HTML with Xpath

Introduction to xpath in the context of web-scraping. How to extract data from HTML documents using xpath, best practice...

 

 ](https://scrapfly.io/blog/posts/parsing-html-with-xpath) [  

 python tools 

### Python lxml Tutorial: How to Parse HTML and XML

In this tutorial, we'll take a deep dive into lxml, a powerful Python library that allows for parsing HTML and XML effec...

 

 ](https://scrapfly.io/blog/posts/intro-to-parsing-html-xml-python-lxml) [  

 nodejs headless-browser 

### How to Web Scrape with Puppeteer and NodeJS in 2026

Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practic...

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-puppeteer-and-nodejs) 

  ## Related Questions

- [ Q How to select all elements between two elements in XPath? ](https://scrapfly.io/blog/answers/how-to-select-all-elements-between-two-known-elements-in-xpath)
- [ Q How to select sibling elements in XPath? ](https://scrapfly.io/blog/answers/how-to-select-sibling-elements-using-xpath)
- [ Q How to find sibling HTML nodes using BeautifulSoup and Python? ](https://scrapfly.io/blog/answers/how-to-find-siblings-nodes-with-beautifulsoup)
- [ Q How to select following siblings using CSS selectors? ](https://scrapfly.io/blog/answers/how-to-select-following-sibling-element-css-selectors)
 
  



   



 Extract structured data with AI, **1,000 free credits** [Start Free](https://scrapfly.io/register)