How to Scrape Google Maps

article feature image

In this web scraping tutorial, we'll be scraping Google Maps (google.com/maps) - a map service that is also a major directory for business details and reviews.

Google Maps is a complex piece of web software so we'll be using browser automation toolkits like Selenium, Playwright and ScrapFly to render the pages for us in Python. We'll cover all three of these options so feel free to follow along in the one you're most familiar with - let's jump in!

Why Scrape Google Maps?

Google Maps contains a vast amount of business profile data like addresses, ratings, phone numbers and website addresses. Not only it's a powerful data directory for business intelligence and market analysis it can also be used for lead generation as it contains business contact details.

For more on scraping use cases see our extensive web scraping use case article

Setup

In this tutorial, we'll mostly be using Javascript execution feature of browser automation libraries like Selenium, Playwright and ScrapFly's Javascript Rendering feature to retrieve the fully rendered HTML pages.

Scraping Dynamic Websites Using Web Browsers

If you're unfamiliar with web scraping using browsers see our full introduction tutorial which covers Playwright, Puppeteer, Selenium and ScrapFly browser automation toolkits.

Scraping Dynamic Websites Using Web Browsers

So, which browser automation library is best for scraping Google Maps?
We only need the rendering and javascript execution capabilities so whichever you're the most comfortable with! That being said, ScrapFly not only provides browser automation context but several powerful features that'll help us to get around web scraper blocking like:

To parse those pages we'll be using parsel - a community package that supports HTML parsing via CSS or XPath selectors.

Parsing HTML with CSS Selectors

We'll use really simple CSS selectors in this tutorial, however if you're completely unfamiliar with CSS selectors see our full introduction article.

Parsing HTML with CSS Selectors

These packages can be installed through a terminal tool pip:

$ pip install scrapfly-sdk parsel 
# for selenium
$ pip install selenium
# for playwright
$ pip install playwright

The most powerful feature of these tools is javascript execution feature. This feature allows us to send browser any javascript code snippet and it will execute it in the context of the current page. Let's take a quick look how can use it in our tools:

Javascript Execution in ScrapFly
from scrapfly import ScrapflyClient, ScrapeConfig

scrapfly = ScrapflyClient(key="YOUR_SCRAPFLY_KEY", max_concurrency=2)

script = 'return document.querySelector("h1").innerHTML'  # get first header text
result = scrapfly.scrape(
    ScrapeConfig(
        url="https://httpbin.org/html",
        # enable javascript rendering for this request and execute a script:
        render_js=True,
        js=script,
    )
)
# results are located in the browser_data field:
title = result.scrape_result['browser_data']['javascript_evaluation_result']
print(title)
Javascript Execution in Selenium
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://httpbin.org/html")
script = 'return document.querySelector("h1").innerHTML'  # get first header text
title = driver.execute_script(script)
print(title)
Javascript Execution in Playwright
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("http://httpbin.org/html")
    script = 'return document.querySelector("h1").innerHTML'  # get first header text
    title = page.evaluate("() => {" + script + "}")
    print(title)

Javascript execution is the key feature of all browser automation tools, so let's take a look at how can we use it to scrape Google Maps!

Finding Places

To find places on Google Maps we'll take advantage of the search system. Maps being a Google product has a great search system that can understand natural language queries.

For example, if we want to find the page for Louvre Museum in Paris we can use an accurate human-like search query: Louvre Museum in Paris. Which can be submitted directly through the URL endpoint /maps/search/louvr+paris:

screen capture of google maps instant search results

narrow queries show a single result

On the other hand, for less accurate search queries like
/maps/search/mcdonalds+in+paris will provide multiple results to choose from:

screen capture of google maps multi result search query

broad queries show multiple results

We can see that this powerful endpoint can either take us directly to the place page or show us multiple results. To start, let's take a look at the latter - how to scrape multiple search results.


Now that we know how to generate the search URL all we need to do is open it up, wait for it to load and extract the links. As we'll be working with highly dynamic web pages we need a helper function that will help us ensure the content has loaded:

// Wait for N amount of selectors to be present in the DOM
function waitCss(selector, n=1, require=false, timeout=5000) {
  console.log(selector, n, require, timeout);
  var start = Date.now();
  while (Date.now() - start < timeout){
  	if (document.querySelectorAll(selector).length >= n){
      return document.querySelectorAll(selector);
    }
  }
  if (require){
      throw new Error(`selector "${selector}" timed out in ${Date.now() - start} ms`);
  } else {
      return document.querySelectorAll(selector);
  }
}

With this utility function, we'll be able to make sure our scraping script waits for the content to load before starting to parse it.

Let's put it to use in our search scraper:

Using ScrapFly
from scrapfly import ScrapflyClient, ScrapeConfig

scrapfly = ScrapflyClient(key="YOUR_SCRAPFLY_KEY", max_concurrency=2)
script = """
function waitCss(selector, n=1, require=false, timeout=5000) {
  console.log(selector, n, require, timeout);
  var start = Date.now();
  while (Date.now() - start < timeout){
  	if (document.querySelectorAll(selector).length >= n){
      return document.querySelectorAll(selector);
    }
  }
  if (require){
      throw new Error(`selector "${selector}" timed out in ${Date.now() - start} ms`);
  } else {
      return document.querySelectorAll(selector);
  }
}
"""
def search(query):
    result = scrapfly.scrape(
        ScrapeConfig(
            url=f"https://www.google.com/maps/search/{query.replace(" ", "+")}/?hl=en",
            render_js=True,
            js=script,
            country="US",
        )
    )
    urls = result.scrape_result['browser_data']['javascript_evaluation_result']
    return urls
print(search("louvre museum in paris"))
print(search("mcdonalds in paris"))
Using Selenium
from selenium import webdriver


script = """
function waitCss(selector, n=1, require=false, timeout=5000) {
  console.log(selector, n, require, timeout);
  var start = Date.now();
  while (Date.now() - start < timeout){
  	if (document.querySelectorAll(selector).length >= n){
      return document.querySelectorAll(selector);
    }
  }
  if (require){
      throw new Error(`selector "${selector}" timed out in ${Date.now() - start} ms`);
  } else {
      return document.querySelectorAll(selector);
  }
}

var results = waitCss("div[role*=article]>a", n=10, require=false);
return Array.from(results).map((el) => el.getAttribute("href"))
"""

driver = webdriver.Chrome()


def search(query):
    url = f"https://www.google.com/maps/search/{query.replace(' ', '+')}/?hl=en"
    driver.get(url)
    urls = driver.execute_script(script)
    return urls or [url]


print(f'single search: {search("louvre museum in paris")}')
print(f'multi search: {search("mcdonalds in paris")}')

Using Playwright
from playwright.sync_api import sync_playwright

script = """
function waitCss(selector, n=1, require=false, timeout=5000) {
  console.log(selector, n, require, timeout);
  var start = Date.now();
  while (Date.now() - start < timeout){
  	if (document.querySelectorAll(selector).length >= n){
      return document.querySelectorAll(selector);
    }
  }
  if (require){
      throw new Error(`selector "${selector}" timed out in ${Date.now() - start} ms`);
  } else {
      return document.querySelectorAll(selector);
  }
}

var results = waitCss("div[role*=article]>a", n=10, require=false);
return Array.from(results).map((el) => el.getAttribute("href"))
"""


def search(query, page):
    url = f"https://www.google.com/maps/search/{query.replace(' ', '+')}/?hl=en"
    page.goto(url)
    urls = page.evaluate("() => {" + script + "}")
    return urls or [url]


with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    print(f'single search: {search("louvre museum in paris", page=page)}')
    print(f'multi search: {search("mcdonalds in paris", page=page)}')

Results
single search: ['https://www.google.com/maps/search/louvre+museum+in+paris/?hl=en']
multi search: [
    "https://www.google.com/maps/place/McDonald's/data=!4m7!3m6!1s0x47e66fea26bafdc7:0x21ea7aaf1fb2b3e3!8m2!3d48.8729997!4d2.2991604!16s%2Fg%2F1hd_88rdh!19sChIJx_26Jupv5kcR47OyH6966iE?authuser=0&hl=en&rclk=1",
    ...
]

Now that we can successfully find places on google maps let's take a look at how can we scrape their data.

Scraping Places

To scrape place data we'll be using the same approach of rendering javascript content using browser automation. To do that, we'll take the company URLs we discovered previously and scrape the overview data of each company.

screen capture of Google Maps place profile

loads of valuable data about the business

To parse the rendered HTML data we'll be using parsel with a few simple CSS selectors:

def parse_place(selector):
    """parse Google Maps place"""
    
    def aria_with_label(label):
        """gets aria element as is"""
        return selector.css(f"*[aria-label*='{label}']::attr(aria-label)")

    def aria_no_label(label):
        """gets aria element as text with label stripped off"""
        text = aria_with_label(label)
        return text.split(label, 1)[1].strip()

    result = {
        "name": "".join(selector.css("h1 ::text").getall()).strip(),
        "category": selector.css("button[jsaction='pane.rating.category']::text").get(),
        # most of the data can be extracted through accessibility labels:
        "address": aria_no_label("Address: "),
        "website": aria_no_label("Website: "),
        "phone": aria_no_label("Phone: "),
        "review_count": aria_with_label(" reviews").get(),
        "work_hours": aria_with_label("Monday, ").get().split(". Hide")[0],
        # to extract star numbers from text we can use regex pattern for numbers: "\d+"
        "stars": aria_with_label(" stars").re("\d+.*\d+")[0],
        "5_stars": aria_with_label("5 stars").re(r"(\d+) review")[0],
        "4_stars": aria_with_label("4 stars").re(r"(\d+) review")[0],
        "3_stars": aria_with_label("3 stars").re(r"(\d+) review")[0],
        "2_stars": aria_with_label("2 stars").re(r"(\d+) review")[0],
        "1_stars": aria_with_label("1 stars").re(r"(\d+) review")[0],
    }
    return result

Since google maps is using complex HTML structures and CSS styles so parsing it can be very difficult. Fortunately, google maps also implement vast accessibility features which we can take advantage of!

Let's add this to our scraper:

Using ScrapFly
import json
from scrapfly import ScrapflyClient, ScrapeConfig

urls = ["https://goo.gl/maps/Zqzfq43hrRPmWGVB7"]

def parse_place(selector):
    """parse Google Maps place"""
    
    def aria_with_label(label):
        """gets aria element as is"""
        return selector.css(f"*[aria-label*='{label}']::attr(aria-label)")

    def aria_no_label(label):
        """gets aria element as text with label stripped off"""
        text = aria_with_label(label).get("")
        return text.split(label, 1)[1].strip()

    result = {
        "name": "".join(selector.css("h1 ::text").getall()).strip(),
        "category": selector.css("button[jsaction='pane.rating.category']::text").get(),
        # most of the data can be extracted through accessibility labels:
        "address": aria_no_label("Address: "),
        "website": aria_no_label("Website: "),
        "phone": aria_no_label("Phone: "),
        "review_count": aria_with_label(" reviews").get(),
        # to extract star numbers from text we can use regex pattern for numbers: "\d+"
        "stars": aria_with_label(" stars").re("\d+.*\d+")[0],
        "5_stars": aria_with_label("5 stars").re(r"(\d+) review")[0],
        "4_stars": aria_with_label("4 stars").re(r"(\d+) review")[0],
        "3_stars": aria_with_label("3 stars").re(r"(\d+) review")[0],
        "2_stars": aria_with_label("2 stars").re(r"(\d+) review")[0],
        "1_stars": aria_with_label("1 stars").re(r"(\d+) review")[0],
    }
    return result



scrapfly = ScrapflyClient(key="YOUR_SCRAPFLY_KEY", max_concurrency=2)
places = []
for url in urls:
    result = scrapfly.scrape(
        ScrapeConfig(
            url="https://www.google.com/maps/search/koh+tao+divers+in+koh+tao+thailand/",
            render_js=True,
            js=script,
            country="US",
        )
    )
    places.append(parse_place(result.selector))
print(json.dumps(places, indent=2, ensure_ascii=False))
Using Playwright
import json

from parsel import Selector
from playwright.sync_api import sync_playwright


def parse_place(selector):
    """parse Google Maps place"""

    def aria_with_label(label):
        """gets aria element as is"""
        return selector.css(f"*[aria-label*='{label}']::attr(aria-label)")

    def aria_no_label(label):
        """gets aria element as text with label stripped off"""
        text = aria_with_label(label).get("")
        return text.split(label, 1)[1].strip()

    result = {
        "name": "".join(selector.css("h1 ::text").getall()).strip(),
        "category": selector.css("button[jsaction='pane.rating.category']::text").get(),
        # most of the data can be extracted through accessibility labels:
        "address": aria_no_label("Address: "),
        "website": aria_no_label("Website: "),
        "phone": aria_no_label("Phone: "),
        "review_count": aria_with_label(" reviews").get(),
        # to extract star numbers from text we can use regex pattern for numbers: "\d+"
        "stars": aria_with_label(" stars").re("\d+.*\d+")[0],
        "5_stars": aria_with_label("5 stars").re(r"(\d+) review")[0],
        "4_stars": aria_with_label("4 stars").re(r"(\d+) review")[0],
        "3_stars": aria_with_label("3 stars").re(r"(\d+) review")[0],
        "2_stars": aria_with_label("2 stars").re(r"(\d+) review")[0],
        "1_stars": aria_with_label("1 stars").re(r"(\d+) review")[0],
    }
    return result


urls = ["https://goo.gl/maps/Zqzfq43hrRPmWGVB7?hl=en"]
places = []
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    for url in urls:
        page.goto(url)
        page.wait_for_selector("button[jsaction='pane.rating.category']")
        places.append(parse_place(Selector(text=page.content())))

print(json.dumps(places, indent=2, ensure_ascii=False))

Results
[
  {
    "name": "Louvre Museum",
    "category": "Art museum",
    "address": "Rue de Rivoli, 75001 Paris, France",
    "website": "louvre.fr",
    "phone": "+33 1 40 20 50 50",
    "review_count": "240,040 reviews",
    "stars": "4.7",
    "5_stars": "513",
    "4_stars": "211",
    "3_stars": "561",
    "2_stars": "984",
    "1_stars": "771"
  }
]

We can see that just with a few lines of clever code we can scrape business data from google maps. We focused on a few visible details but there's more information available in the HTML body like reviews, news articles and various classification tags.

FAQ

To wrap this guide up let's take a look at some frequently asked questions about web scraping Google Maps:

Is scraping Google Maps legal?

Yes. Google Maps data is publicly available, and we're not extracting anything private. Scraping Google Maps at slow, respectful rates would fall under the ethical scraping definition. That being said, attention should be paid to GDRP compliance in the EU when scraping personal data such as details attached to the reviews (images or names). For more, see our Is Web Scraping Legal? article.

How to change Google Maps display language?

To change the language of the content displayed on Google Maps we can use URL parameter hl (stands for "Human Language"). For example, for English we would add ?hl=en to the end of our URL, e.g. google.com/maps/search/big+ben+londong+uk/?hl=en

Summary

In this tutorial, we built an Google Maps scraper that can be used in Selenium, Playwright or ScrapFly's SDK. We did this by launching a browser instance and controlling it via Python code to find businesses through Google Maps search and scrape the details like website, phone number and meta information of each business.

For this, we used Python with a few community packages included in the scrapfly-sdk and to prevent being blocked we used ScrapFly's API which smartly configures every web scraper connection to avoid being blocked. For more on ScrapFly see our documentation and try it out for free!

Related Posts

How to Scrape StockX e-commerce Data with Python

In this first entry in our fashion data web scraping series we'll be taking a look at StockX.com - a marketplace that treats apparel as stocks and how to scrape it all.

How to Scrape Twitter with Python

With the news of Twitter dropping free API access we're taking a look at web scraping Twitter using Python for free. In this tutorial we'll cover two methods: using Playwright and Twitter's hidden graphql API.

How to Scrape RightMove Real Estate Property Data with Python

In this scrape guide we'll be taking a look at scraping RightMove.co.uk - one of the most popular real estate listing websites in the United Kingdom. We'll be scraping hidden web data and backend APIs directly using Python.