CSS selectors is a powerful HTML querying protocol which is used by browsers to determine what HTML elements to style.
It's also incredibly useful in HTML parsing when web scraping or processing HTML data, as the same queries can be used to select values as well.
In web scraping, CSS selectors are an easy and powerful way to parse HTML data and are used in many web scraping libraries. This article is a carefully curated CSS Selector cheatsheet for web scraping, though it can apply to any HTML parsing tasks.
This CSS selector cheatsheet contains all selector features used in HTML parsing.
Clicking on the explanation text will take you to a real-life interactive example with more details. Note that CSS selectors can differ in different implementations, so unique non-standard features are marked as such.
These features are, however available in XPath selector engine.
Follow us on
The > direct child selector selects only direct children of the parent element. Here, the a element is selected as it's a direct child of p and div. Note that this selector can be dangerous as HTML tree depth can change easily breaking the selector. For example, if the a element is wrapped in span the selector will break.
Follow us on
Space selects any descended no matter how many layers deep. Here, the a element is selected as it's a descendant of div.
The :nth-child pseudo selector will select only the elements that are Nth children in their group of all siblings. In other words, Nth element in the group. It also supports special values like even and odd - try them!
The :nth-of-type pseudo selector will select elements of given type that are Nth element in their group. It's similar to :first-of-type and :last-of-type just more flexible as index can be specified. It also supports special values like even and odd - try them!
The :has() pseudo selector is a way of selecting a parent element based on the existence of a certain child. Here, the div elements that have a child with product class are selected. Note that using any descendant selector (space) can cause a lot of duplicate results so using the direct child selector (>) is recommended. Try removing the `>`` to see the difference.
The :is() pseudo selector is a way of selecting elements that match any of the supplied selectors. Here, the div and span elements are selected as they match the :is() selector. This pseudo selector can be very powerful when combined with :not - try to exclude .foo from the selection.