Ultimate CSS Selector Cheatsheet for HTML Parsing

article feature image

CSS selectors is a powerful HTML querying protocol which is used by browsers to determine what HTML elements to style.
It's also incredibly useful in HTML parsing when web scraping or processing HTML data, as the same queries can be used to select values as well.

In web scraping, CSS selectors are an easy and powerful way to parse HTML data and are used in many web scraping libraries. This article is a carefully curated CSS Selector cheatsheet for web scraping, though it can apply to any HTML parsing tasks.

Parsing HTML with CSS Selectors

If you're new to CSS selectors, see our complete beginner-friendly introduction to CSS selectors for HTML parsing

Parsing HTML with CSS Selectors
🗋⮭

This CSS selector cheatsheet contains all selector features used in HTML parsing.
Clicking on the explanation text will take you to a real-life interactive example with more details. Note that CSS selectors can differ in different implementations, so unique non-standard features are marked as such.

Cheatsheet

Selector Explanation
Navigation
> selects direct child
(space) selects any descendant
~ selects following sibling
+ selects direct following sibling
, separator for joined selectors
Attribute Matching
. selects by class
# selects by id
[] attribute selector
[attr] select elements that have attribute present (even if it's empty)
[attr=value] match exact attribute value
[attr=value i] i suffix turns any attribute match case insensitive*
[attr*=value] match containing attribute value
[attr|=value] match exact ignoring "-suffixes" value
[attr^=value] match attributes that start with value
[attr$=value] match attributes that end with value
[attr~=value] match attributes that contain a word
Element Matching
:not() reverses selection
:has() select if element has a matching descendant
:is() apply multiple selectors
:first-child select if it's the first element in the group
:last-child select if it's the last element in the group
:nth-child() select if it's the Nth element, supports even, odd
:nth-last-child() like nth-child but reversed
:first-of-type select if it's the first element of that type in the group.
:last-of-type select if it's the last element of that type in the group.
:nth-of-type() select if it's the Nth element of that type in the group.
:only-of-type() select if it's the only element of that type in the group.
Non-standard Functions
::attr(name) select attribute value. Available in scrapy, parsel, Scrapfly SDK
:text select text value. Available in scrapy, parsel, Scrapfly SDK

* limited availability

What CSS selectors cannot do:

  • Select preceding siblings.
  • Select parent or ancestor elements.
  • Select array slices.
  • Select by text value.
  • Select by element count.
  • Select by element depth.

These features are, however available in XPath selector engine.

Direct Child

<div> <p > Follow us on <a href="https://x.com/@scrapfly_dev">X!</a> <skip>ignore</skip> </p> </div>

The > direct child selector selects only direct children of the parent element. Here, the a element is selected as it's a direct child of p and div. Note that this selector can be dangerous as HTML tree depth can change easily breaking the selector. For example, if the a element is wrapped in span the selector will break.

Any Descendant

<div> <p > Follow us on <a href="https://x.com/@scrapfly_dev">X!</a> <skip>ignore</skip> </p> </div>

Space selects any descended no matter how many layers deep. Here, the a element is selected as it's a descendant of div.

Any Following Sibling

<article> <p>ignore</p> <p class="ad">ignore</p> <p>select</p> <p>select</p> </article>

The ~ selects any following general sibling no matter how many layers deep. Here, the p elements are selected as they are following siblings of .ad.

Direct Following Sibling

<article> <p>ignore</p> <p class="ad">ignore</p> <p>select</p> <p>ignore</p> </article>

The + selects one following adjacent sibling (i.e. has to be right below it). Here, the first p element is selected as it's a direct following sibling of .ad.

Joining Selectors

<div> <article> <p>select paragraph</p> <div> <div>ignore</div> <p>select nested paragraph</p> </div> <span>select span</span> <a>select link</a> <div>ignore</div> </article> </div>

Selectors can be joined with , to select multiple elements. Here, the p, span and a elements are selected. Note that the result order usually follows the structure of the HTML tree.

by Class

<div> <div class="product">select</div> <div class="sold product">select</div> <div class="sold product new">select</div> <div class="product-2">ignore</div> </div>

The . selector can be used to restrict the selection to elements that contain the class value in the class attribute. Here, the div elements with product in the class attribute are selected.

by ID

<div> <div id="product">select</div> <div id="sold product">select</div> <div id="sold product new">select</div> <div id="product-2">ignore</div> </div>

by Attribute

<div> <a href="#">enabled link</a> <a>disabled link</a> <a href="">enabled link</a> </div>

Square brackets ([]) can be used to match elements by attribute values. For example, [href] matches any element that has href attribute (even if it's empty).

by Attribute Value

<div> <span data-item="product">select</span> <div data-item="product">select</div> <span data-item="product-new">ignore</span> </div>

Attributes can be matched exactly using attrib=value syntax. Note that this is case-sensitive.

by Case Insensitive Attribute Value

<div> <span data-item="PRODUCT">select</span> <div data-item="Product">select</div> <div data-item="product">select</div> <span data-item="product-new">ignore</span> </div>

Any attribute matcher can be made case-insensitive by adding i suffix. Here, the span and div elements are selected as they match the data-item attribute value case-insensitively.

by Partial Attribute Value

<div> <a href="social-link.com">select</a> <a href="social-link2.com">select</a> <a href="ignore">ignore</a> </div>

The *= will match when attribute contains the supplied value anywhere in the value string.

by Attribute Value Ignoring Minus Suffix

<div> <a class="important-link">select</a> <a class="important-url">select</a> <a class="important">select</a> <a class="foo important-item">doesn't begin exactly</a> <a class="important item">contains more than just match</a> <a class="importantitem">doesn't match</a> </div>

The |= selector is unique and matches only when value matches exactly or has a trailing -suffix.

by Attribute Value Starting With

<div> <a class="dataname">select</a> <a class="data-age">select</a> <a class="data extra">select</a> <a class="foo data">ignore</a> </div>

The ^= selector matches when attribute value starts with the supplied value exactly.

by Attribute Value Ending With

<div> <a class="name-data">select</a> <a class="age data">select</a> <a class="data">select</a> <a class="data foo">ignore</a> </div>

The $= selector matches when attribute value ends with the supplied value exactly.

by Attribute Containing Word

<div> <a class="data">select</a> <a class="foo data">select</a> <a class="foo data bar">select</a> <a class="datafoo">ignore</a> <a class="data-bar">ignore</a> </div>

The ~= selector matches when attribute value contains the supplied value as a word. A word is defined as a string of characters delimited by spaces.

Reversing Matchers Using Not

<div> <a class="foo">select</a> <a class="ignore">ignore</a> <a class="bar">select</a> <a class="data">select</a> <a class="ignore">ignore</a> </div>

The :not() pseudo selector follows node selector and will reverse any matcher like .class, #id or attribute matchers like [attribute=ignore].

First Child

<div> <div class="products"> <a>select</a> <a>ignore</a> </div> <div class="products"> <a>select</a> <a>ignore</a> </div> <a>ignore</a> </div>

The :first-child pseudo selector will select only the elements that are first children in their group of all siblings. In other words, first element in the group.

Last Child

<div> <div class="products"> <a>ignore</a> <a>select</a> </div> <div class="products"> <a>ignore</a> <a>select</a> </div> <a>ignore</a> </div>

The :last-child pseudo selector will select only the elements that are last children in their group of all siblings. In other words, last element in the group.

Nth Child

<div> <div class="products"> <a>ignore</a> <a>select</a> <a>ignore</a> </div> <div class="products"> <div>ignore</div> <a>select</a> <a>ignore</a> </div> <a>ignore</a> </div>

The :nth-child pseudo selector will select only the elements that are Nth children in their group of all siblings. In other words, Nth element in the group. It also supports special values like even and odd - try them!

Nth Last Child

<div> <div class="products"> <a>ignore</a> <a>ignore</a> <a>select</a> <a>ignore</a> </div> <div class="products"> <div>ignore</div> <a>ignore</a> <a>select</a> <a>ignore</a> </div> <a>ignore</a> </div>

The :nth-last-child pseudo selector is just :nth-child selector but reversed. In the xample above we're selecting 2nd to last element in the group.

First Of Type

<div> <div class="products"> <a>select</a> <a>ignore</a> </div> <div class="products"> <div>ignore</div> <a>select</a> <a>ignore</a> </div> <a>ignore</a> </div>

The :first-of-type pseudo selector will select the last element of given type. It's similar to :first-child but instead of considering all siblings, it considers only siblings of the same node type.

Last Of Type

<div> <div class="products"> <a>ignore</a> <a>select</a> </div> <div class="products"> <div>ignore</div> <a>ignore</a> <a>select</a> </div> <a>ignore</a> </div>

The :last-of-type pseudo selector will select the last element of given type. It's similar to :last-child but instead of considering all siblings, it considers only siblings of the same node type.

Nth Of Type

<div> <div class="products"> <a>ignore</a> <a>select</a> <a>ignore</a> </div> <div class="products"> <div>ignore</div> <a>ignore</a> <a>select</a> <a>ignore</a> </div> <a>ignore</a> </div>

The :nth-of-type pseudo selector will select elements of given type that are Nth element in their group. It's similar to :first-of-type and :last-of-type just more flexible as index can be specified. It also supports special values like even and odd - try them!

Only of Type

<div> <div class="products"> <a>ignore</a> <a>ignore</a> <a>ignore</a> </div> <div class="products"> <span>ignore</span> <a>select</a> <span>ignore</span> </div> <a>ignore</a> </div>

The :only-of-type pseudo selector will select elements of given type that are the only element of said type in their group.

Has Descendant

<article> <div> <a class="product">select</a> <a>select</a> </div> <div> <div class="wrapper"> <a class="product">select</a> <a>select</a> </div> </div> <div> <a class="advertisement">ignore</a> <div>ignore</div> </div> </article>

The :has() pseudo selector is a way of selecting a parent element based on the existence of a certain child. Here, the div elements that have a child with product class are selected. Note that using any descendant selector (space) can cause a lot of duplicate results so using the direct child selector (>) is recommended. Try removing the `>`` to see the difference.

Is Matcher

<article> <div class="product">select</div> <span class="product foo">select</span> <p class="product">ignore</p> </article>

The :is() pseudo selector is a way of selecting elements that match any of the supplied selectors. Here, the div and span elements are selected as they match the :is() selector. This pseudo selector can be very powerful when combined with :not - try to exclude .foo from the selection.

Getting Attribute Value

<article> <a href="some url1">select</a> <a href="some url2">select</a> <span href="some url2">ignore</span> </article>

The ::attr() is a non-standard pseudo selector used in tools like scrapy, parsel and Scrapfly SDK to select element attribute exclusively.

Getting Element Text

<article> <a href="some url1">select<div>select-nested</div></a> <a href="some url2">select</a> <span href="some url2">ignore</a> </article>

The ::text is a non-standard pseudo selector used in tools like scrapy, parsel and Scrapfly SDK to select element text directly.

Related Posts

How to Parse XML

In this article, we'll explain about XML parsing. We'll start by defining XML files, their format and how to navigate them for data extraction.

Web Scraping With Ruby

Introduction to web scraping with Ruby. How to handle http connections, parse html files for data, best practices, tips and an example project.

Web Scraping With NodeJS and Javascript

In this article we'll take a look at scraping using Javascript through NodeJS. We'll cover common web scraping libraries, frequently encountered challenges and wrap everything up by scraping etsy.com