How to Scrape With Headless Firefox
Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.
To parse web scraped content in NodeJS using CSS selectors we recommend the Cheerio library:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<h1>Page title</h1>
<p>some paragraph</p>
<a href="http://scrapfly.io/blog">some link</a>
`);
$('h1').text();
"Page title"
$('a').attribute("href");
"http://scrapfly.io/blog"
Another popular library is Osmosis which supports HTML parsing through both CSS and XPath selectors:
const osmosis = require("osmosis");
const html = `
<a class="link" href="http://scrapfly.io/">link 1</a>
<a class="link" href="http://scrapfly.blog/">link 2</a>
`
osmosis
.parse(html)
.find('a.link')
.log(console.log);