How to Scrape Forms
Learn how to scrape forms through a step-by-step guide using HTTP clients and headless browsers.
To parse web scraped content in NodeJS using CSS selectors we recommend the Cheerio library:
const cheerio = require('cheerio');
const $ = cheerio.load(`
<h1>Page title</h1>
<p>some paragraph</p>
<a href="http://scrapfly.io/blog">some link</a>
`);
$('h1').text();
"Page title"
$('a').attribute("href");
"http://scrapfly.io/blog"
Another popular library is Osmosis which supports HTML parsing through both CSS and XPath selectors:
const osmosis = require("osmosis");
const html = `
<a class="link" href="http://scrapfly.io/">link 1</a>
<a class="link" href="http://scrapfly.blog/">link 2</a>
`
osmosis
.parse(html)
.find('a.link')
.log(console.log);
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇