How to Scrape Forms
Learn how to scrape forms through a step-by-step guide using HTTP clients and headless browsers.
Using NodeJS' Cheerio we can find any HTML element by partial or exact text value using the :contains()
pseudo selector:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<a>ignore</a>
<a href="http://example.com">link</a>
<a>ignore</a>
`);
console.log(
$('a:contains("link")').text()
);
"link"
This selector is case sensitive so it might be dangerous to use in web scraping. Instead, it's advised to filter values by text:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<a>ignore</a>
<a href="http://example.com">Link</a>
<a>ignore</a>
`);
console.log(
$('a').filter(
(i, element) => { return $(element).text().toLowerCase().includes("link")}
).text()
);
"link"
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇