How to use XPath selectors in NodeJS when web scraping?

CSS selectors are much more widely used in NodeJS and Javascript ecosystems though for web scraping we might need more powerful features of XPath selectors.
There are few options available for XPath selectors. Most popular one in web scraping is the osmosis library:

const osmosis = require("osmosis");

const html = `
<a href="http://scrapfly.io/">link 1</a>    
<a href="http://scrapfly.blog/">link 2</a>
`
osmosis
    .parse(html)
    .find('//a/@href')
    .log(console.log);

Another alternative is the xmldom library:

import xpath from 'xpath';
import { DOMParser } from '@xmldom/xmldom'

const tree = new DOMParser().parseFromString(`
<body>
    <h1>Page title</h1>
    <p>some paragraph</p>
    <a href="http://scrapfly.io/blog">some link</a>
</body>
`);

console.log({
    // we can extract text of the node, which returns `Text` object:
    title: xpath.select('//h1/text()', tree)[0].data,
    // or a specific attribute value, which return `Attr` object:
    url: xpath.select('//a/@href', tree)[0].value,
});
Question tagged: NodeJS, XPath, Data Parsing

Related Posts

Web Scraping With Node-Unblocker

Tutorial on using Node-Unblocker - a nodejs library - to avoid blocking while web scraping and using it to optimize web scraping stacks.

Web Scraping With NodeJS and Javascript

In this article we'll take a look at scraping using Javascript through NodeJS. We'll cover common web scraping libraries, frequently encountered challenges and wrap everything up by scraping etsy.com

Web Scraping With a Headless Browser: Puppeteer

Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practices and example project.