Web Scraping With Node-Unblocker
Tutorial on using Node-Unblocker - a nodejs library - to avoid blocking while web scraping and using it to optimize web scraping stacks.
Using NodeJS' Cheerio we can find any HTML element by partial or exact text value using the :contains()
pseudo selector:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<a>ignore</a>
<a href="http://example.com">link</a>
<a>ignore</a>
`);
console.log(
$('a:contains("link")').text()
);
"link"
This selector is case sensitive so it might be dangerous to use in web scraping. Instead, it's advised to filter values by text:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<a>ignore</a>
<a href="http://example.com">Link</a>
<a>ignore</a>
`);
console.log(
$('a').filter(
(i, element) => { return $(element).text().toLowerCase().includes("link")}
).text()
);
"link"