🚀 We are hiring! See open positions

How to use XPath selectors in NodeJS when web scraping?

by Bernardas Alisauskas May 29, 2023 1 min read

#data-parsing #nodejs #xpath

CSS selectors are much more widely used in NodeJS and Javascript ecosystems though for web scraping we might need more powerful features of XPath selectors. There are few options available for XPath selectors. Most popular one in web scraping is the osmosis library:

javascript

const osmosis = require("osmosis");

const html = `
<a href="http://scrapfly.io/">link 1</a>
<a href="http://scrapfly.blog/">link 2</a>
`
osmosis
    .parse(html)
    .find('//a/@href')
    .log(console.log);

Another alternative is the xmldom library:

javascript

import xpath from 'xpath';
import { DOMParser } from '@xmldom/xmldom'

const tree = new DOMParser().parseFromString(`

    <h1>Page title</h1>
<p>some paragraph</p>
<a href="http://scrapfly.io/blog">some link</a>

`);

console.log({
    // we can extract text of the node, which returns `Text` object:
    title: xpath.select('//h1/text()', tree)[0].data,
    // or a specific attribute value, which return `Attr` object:
    url: xpath.select('//a/@href', tree)[0].value,
});

Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies — all in one API. Start with 1,000 free credits.

No credit card required 1,000 free API credits Anti-bot bypass included

Start Free View Docs

Not ready? Get our newsletter instead.