How to scroll to the bottom of the page with Playwright?

by scrapecrow Jun 30, 2023

When web scraping, it's common to encounter infinite scroll pages. These web pages require scrolling to the end of the page to load more content.

In this guide, we'll explore how to scroll to the bottom of the page with Playwright using three distinct approaches for both Python and NodeJS clients.

Using JavaScript

In order to allow Playwright scroll to bottom, we can use the window.scrollTo(x, y) JavaScript function. This enables vertical scrolling untill the very bottom of the page is reached.

Here's how to use Playwright to infinite scroll web pages. We'll scrape web-scraping.dev/testimonials, which loads more data with scrolls:

Python

NodeJS

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()
    # navigate to the website
    page.goto("https://web-scraping.dev/testimonials/")

    # scroll to the bottom:
    _prev_height = -1
    _max_scrolls = 100
    _scroll_count = 0
    while _scroll_count < _max_scrolls:
        # Execute JavaScript to scroll to the bottom of the page
        page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
        # Wait for new content to load (change this value as needed)
        page.wait_for_timeout(1000) # wait for 1000 milliseconds
        # Check whether the scroll height changed - means more pages are there
        new_height = page.evaluate("document.body.scrollHeight")
        if new_height == _prev_height:
            break
        _prev_height = new_height
        _scroll_count += 1
        
    # Now we can collect all loaded data on the document:
    results = []
    for element in page.locator(".testimonial").element_handles():
        text = element.query_selector(".text").inner_html()
        results.append(text)
    print(f"scraped: {len(results)} results!")

const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();
  // navigate to the website
  await page.goto('https://web-scraping.dev/testimonials/');

  // Scroll to the bottom:
  let prevHeight = -1;
  const maxScrolls = 100;
  let scrollCount = 0;
  
  while (scrollCount < maxScrolls) {
    // Execute JavaScript to scroll to the bottom of the page
    await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
    // Wait for new content to load (change this value as needed)
    await page.waitForTimeout(1000); // wait for 1000 milliseconds
    // Check whether the scroll height changed - means more pages are there
    const newHeight = await page.evaluate(() => document.body.scrollHeight);
    if (newHeight === prevHeight) {
      break;
    }
    prevHeight = newHeight;
    scrollCount++;
  }
  
  // Now we can collect all loaded data on the document:
  const results = await page.$$eval('.testimonial .text', elements =>
    elements.map(element => element.innerHTML)
  );
  console.log(`scraped: ${results.length} results!`);
  console.log(results);

  await browser.close();
})();

Above, we're scraping an endless paging example from web-scraping.dev.
We start a while loop and keep scrolling to the bottom until the browser's vertical size stops changing.
Then, once the bottom is reached we can start parsing the content.

Above, we define three variables:

_prev_height: page height before scrolling to compare
_max_scrolls: maximum number of scrolls to perform
_scroll_count: current number of scrolls performed

Then, we start a while loop to keep executing the window.scrollTo JavaScript method to scroll down until no new page height is captured. Finally, the full HTML page is parsed once it finishes scrolling vertically.

Using Keyboard

In the previous snippet, we used JavaScript evaluation to emulate scroll action. Since Playwright provides a Keyboard API, we can use it to simulate vertical scrolling:

Python

NodeJS

# ....
    while _scroll_count < _max_scrolls:
        # Scroll to the bottom of the page using keyboard
        page.keyboard.down('End')
        # ....

# ....
  while (scrollCount < maxScrolls) {
    // Scroll to the bottom of the page using keyboard
    await page.keyboard.down('End');
    # ....
  }

Above, we use the keyboard API via the keyboard class to hold the down key till the page ends.

Using Mouse

An obvious way to handle infinite scroll pages is through mouse usage. For this, we can utilize Playwright's mouse API:

Python

NodeJS

# ....
    while _scroll_count < _max_scrolls:
        # Scroll to the bottom of the page using mouse wheel
        page.mouse.wheel(0, 15000)
        # ....

# ....
  while (scrollCount < maxScrolls) {
    // Scroll to the bottom of the page using mouse wheel
    await page.mouse.wheel(0, 15000);
    # ....
  }

Above, we use the mouse class to scroll vertically using a mouse wheel event with the required height length.

For further details on web scraping with Playwright, refer to our dedicated guide.

Web Scraping with Playwright and Python

Playwright is the new, big browser automation toolkit - can it be used for web scraping? In this introduction article, we'll take a look how can we use Playwright and Python to scrape dynamic websites.