How to Scrape Google Maps
We'll take a look at to find businesses through Google Maps search system and how to scrape their details using either Selenium, Playwright or ScrapFly's javascript rendering feature - all of that in Python.
When web scraping with Playwright we can encounter pages that require scrolling to the bottom to load more content. This is a common pattern for infinite scrolling pages.
To scroll our browser custom javascript function window.scrollTo(x, y)
can be used. This function scrolls the page to the specified coordinates.
So, if we need to scroll to the bottom of the page we can use a while
loop to continuously scroll until the bottom is reached.
Let's take a look at an example by scraping web-scraping.dev/testimonials:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
page.goto('https://web-scraping.dev/testimonials/')
# scroll to the bottom:
_prev_height = -1
_max_scrolls = 100
_scroll_count = 0
while _scroll_count < _max_scrolls:
# Execute JavaScript to scroll to the bottom of the page
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
# Wait for new content to load (change this value as needed)
page.wait_for_timeout(1000)
# Check whether the scroll height changed - means more pages are there
new_height = page.evaluate("document.body.scrollHeight")
if new_height == _prev_height:
break
_prev_height = new_height
_scroll_count += 1
# now we can collect all loaded data:
results = []
for element in page.locator('.testimonial').element_handles():
text = element.query_selector('.text').inner_html()
results.append(text)
print(f"scraped: {len(results)} results!")
Above, we're scraping an endless paging example from web-scraping.dev
.
We start a while
loop and keep scrolling to the bottom until the browser vertical size stops changing.
Then, once the bottom is reached we can start parsing the content.