Playwright Examples for Web Scraping and Automation
Learn Playwright with Python and JavaScript examples for automating browsers like Chromium, WebKit, and Firefox.
To parse web scraped content in NodeJS using CSS selectors we recommend the Cheerio library:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<h1>Page title</h1>
<p>some paragraph</p>
<a href="http://scrapfly.io/blog">some link</a>
`);
$('h1').text();
"Page title"
$('a').attribute("href");
"http://scrapfly.io/blog"
Another popular library is Osmosis which supports HTML parsing through both CSS and XPath selectors:
const osmosis = require("osmosis");
const html = `
<a class="link" href="http://scrapfly.io/">link 1</a>
<a class="link" href="http://scrapfly.blog/">link 2</a>
`
osmosis
.parse(html)
.find('a.link')
.log(console.log);
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇