XPath selectors are one of the most popular ways to parse HTML pages when web scraping. In NodeJS and Puppeteer, XPath selectors can be used through the page.$x
method:
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://httpbin.dev/html");
// this will always return all found matches as array:
let elements = await page.$x("//p");
// to get element details we need to use the evaluate method
// for text:
let firstText = await elements[0].evaluate(element => element.textContent);
console.log(firstText);
// for other attributes:
await page.goto("https://httpbin.dev/links/10/1");
let linkElements = await page.$x("//a");
let firstLink = await linkElements[0].evaluate(element => element.href);
console.log(firstLink);
browser.close();
}
run();
⚠ It's possible that this command will try to find elements before the page has fully loaded if it's a dynamic javascript page. For more see How to wait for a page to load in Puppeteer?
Also see: How to find elements by CSS selector in Puppeteer?