Playwright vs Selenium
Explore the key differences between Playwright vs Selenium in terms of performance, web scraping, and automation testing for modern web applications.
When web scraping, it's common to encounter infinite scroll pages. These web pages require scrolling to the end of the page to load more content.
In this guide, we'll explore how to scroll to the bottom of the page with Playwright using three distinct approaches for both Python and NodeJS clients.
In order to allow Playwright scroll to bottom, we can use the window.scrollTo(x, y)
JavaScript function. This enables vertical scrolling untill the very bottom of the page is reached.
Here's how to use Playwright to infinite scroll web pages. We'll scrape web-scraping.dev/testimonials, which loads more data with scrolls:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
# navigate to the website
page.goto("https://web-scraping.dev/testimonials/")
# scroll to the bottom:
_prev_height = -1
_max_scrolls = 100
_scroll_count = 0
while _scroll_count < _max_scrolls:
# Execute JavaScript to scroll to the bottom of the page
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
# Wait for new content to load (change this value as needed)
page.wait_for_timeout(1000) # wait for 1000 milliseconds
# Check whether the scroll height changed - means more pages are there
new_height = page.evaluate("document.body.scrollHeight")
if new_height == _prev_height:
break
_prev_height = new_height
_scroll_count += 1
# Now we can collect all loaded data on the document:
results = []
for element in page.locator(".testimonial").element_handles():
text = element.query_selector(".text").inner_html()
results.append(text)
print(f"scraped: {len(results)} results!")
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: false });
const context = await browser.newContext();
const page = await context.newPage();
// navigate to the website
await page.goto('https://web-scraping.dev/testimonials/');
// Scroll to the bottom:
let prevHeight = -1;
const maxScrolls = 100;
let scrollCount = 0;
while (scrollCount < maxScrolls) {
// Execute JavaScript to scroll to the bottom of the page
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
// Wait for new content to load (change this value as needed)
await page.waitForTimeout(1000); // wait for 1000 milliseconds
// Check whether the scroll height changed - means more pages are there
const newHeight = await page.evaluate(() => document.body.scrollHeight);
if (newHeight === prevHeight) {
break;
}
prevHeight = newHeight;
scrollCount++;
}
// Now we can collect all loaded data on the document:
const results = await page.$$eval('.testimonial .text', elements =>
elements.map(element => element.innerHTML)
);
console.log(`scraped: ${results.length} results!`);
console.log(results);
await browser.close();
})();
Above, we're scraping an endless paging example from web-scraping.dev
.
We start a while
loop and keep scrolling to the bottom until the browser's vertical size stops changing.
Then, once the bottom is reached we can start parsing the content.
Above, we define three variables:
_prev_height
: page height before scrolling to compare_max_scrolls
: maximum number of scrolls to perform_scroll_count
: current number of scrolls performedThen, we start a while
loop to keep executing the window.scrollTo
JavaScript method to scroll down until no new page height is captured. Finally, the full HTML page is parsed once it finishes scrolling vertically.
In the previous snippet, we used JavaScript evaluation to emulate scroll action. Since Playwright provides a Keyboard API, we can use it to simulate vertical scrolling:
# ....
while _scroll_count < _max_scrolls:
# Scroll to the bottom of the page using keyboard
page.keyboard.down('End')
# ....
# ....
while (scrollCount < maxScrolls) {
// Scroll to the bottom of the page using keyboard
await page.keyboard.down('End');
# ....
}
Above, we use the keyboard API via the keyboard class to hold the down
key till the page ends.
An obvious way to handle infinite scroll pages is through mouse usage. For this, we can utilize Playwright's mouse API:
# ....
while _scroll_count < _max_scrolls:
# Scroll to the bottom of the page using mouse wheel
page.mouse.wheel(0, 15000)
# ....
# ....
while (scrollCount < maxScrolls) {
// Scroll to the bottom of the page using mouse wheel
await page.mouse.wheel(0, 15000);
# ....
}
Above, we use the mouse class to scroll vertically using a mouse wheel event with the required height length.
For further details on web scraping with Playwright, refer to our dedicated guide.
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇