Web Scraping With a Headless Browser: Puppeteer
Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practices and example project.
To capture background requests and response in Puppeteer we can use page.on() method to intercept every request/response. Here's how.
To load local files in Puppeteer the file:// URL protocol can be used. Here's how to do it.
To save and load cookies in Puppeteer page.setCookies() and page.cookies() methods can be used. Here's how to do it.
To find HTML elements using CSS selectors in Puppeteer the $ and $eval methods can be used. Here's how to use them.
To find elements by XPath using Puppeteer the $x() method can be used. Here's how to use it.
To retreive page source in Puppteer the page.content() method can be used. Here's how to use it and what are the possible options.
To take a page or HTML element screenshot using Puppeteer the page.screenshot() method can be used. Here's how and what are the possible options.
To wait for a page to load in Puppeteer the best approach is to wait for a specific element to appear using page.waitForSelector() method. Here's how to do it.
Blocking non-critical resources in Puppeteer can drastically speed up the program. Here's how to do in Puppeteer and Nodejs.
To download a file using Puppeteer and NodeJS we can either simulate the click on the download button or use HTTP client. Here's how to do it.
Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practices and example project.
Introduction to using web automation tools such as Puppeteer, Playwright, Selenium and ScrapFly to render dynamic websites for web scraping