How to use CSS selectors in NodeJS when web scraping?

To parse web scraped content in NodeJS using CSS selectors we recommend the Cheerio library:

const cheerio = require('cheerio');

const $ = cheerio.load(`

    <h1>Page title</h1>
<p>some paragraph</p>
<a href="http://scrapfly.io/blog">some link</a>

`);

$('h1').text();
"Page title"
$('a').attribute("href");
"http://scrapfly.io/blog"

Another popular library is Osmosis which supports HTML parsing through both CSS and XPath selectors:

const osmosis = require("osmosis");

const html = `
<a class="link" href="http://scrapfly.io/">link 1</a>
<a class="link" href="http://scrapfly.blog/">link 2</a>
`
osmosis
    .parse(html)
    .find('a.link') 
    .log(console.log);
Question tagged: NodeJS, Data Parsing, Css Selectors

Related Posts

How to Scrape Sitemaps to Discover Scraping Targets

Usually to find scrape targets we look at site search or category pages but there's a better way - sitemaps! In this tutorial, we'll be taking a look at how to find and scrape sitemaps for target locations.

Web Scraping With Node-Unblocker

Tutorial on using Node-Unblocker - a nodejs library - to avoid blocking while web scraping and using it to optimize web scraping stacks.

Web Scraping With NodeJS and Javascript

In this article we'll take a look at scraping using Javascript through NodeJS. We'll cover common web scraping libraries, frequently encountered challenges and wrap everything up by scraping etsy.com