How to get page source in Puppeteer?

When web scraping, we often want to retrieve full page source (full HTML of the web page) we can parse it for data using tools like Cheerio. Using Puppeteer, to get the page source we can use page.content() method:

const puppeteer = require('puppeteer');

async function run() {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto("https://httpbin.dev/html");

    let source = await page.content();
    // OR the faster method that doesn't wait for images to load:
    let source = await page.content({"waitUntil": "domcontentloaded"});

    console.log(source);
    browser.close();
}

run();

⚠ It's possible that this command will retrieve page source before the page fully loads if it's a dynamic javascript page. For more see How to wait for a page to load in Puppeteer?

Question tagged: Puppeteer, Python, Headless Browsers

Related Posts

How to Scrape With Headless Firefox

Discover how to use headless Firefox with Selenium, Playwright, and Puppeteer for web scraping, including practical examples for each library.

How to Use Chrome Extensions with Playwright, Puppeteer and Selenium

In this article, we'll explore different useful Chrome extensions for web scraping. We'll also explain how to install Chrome extensions with various headless browser libraries, such as Selenium, Playwright and Puppeteer.

Web Scraping With Puppeteer - 2024 Puppeteer Tutorial

Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practices and example project.