How to Scrape Zoopla Real Estate Property Data in Python

article feature image

In this web scraping tutorial, we'll be taking a look at how to scrape Zoopla - a popular UK real estate property marketplace.

We'll be scraping real estate data such as pricing info, addresses, photos and phone numbers displayed on Zoopla's property pages.

To scrape Zoopla properties we'll be using hidden web data scraping method as this website is powered by Next.js. We'll also take a look at how to find real estate properties using Zoopla's search and sitemap system to collect all real estate property data available.

Finally, we'll also cover property tracking by continuously scraping for newly listed properties - giving us an edge in real estate market awareness. We'll be using Python with a few community libraries - Let's dive in!

Hands on Python Web Scraping Tutorial and Example Project

If you're new to web scraping with Python we recommend checking out our full introduction tutorial to web scraping with Python and common best practices.

Hands on Python Web Scraping Tutorial and Example Project

Why Scrape Zoopla.com?

Zoopla.com is one of the biggest real estate websites in the United States making it the biggest public real estate dataset out there. Containing fields like real estate prices, listing locations and sale dates and general property information.

This is valuable information for market analytics, the study of the housing industry, and a general competitor overview.

For more on scraping use cases see our extensive write-up Scraping Use Cases

How to Scrape Real Estate Property Data using Python

For more real estate scrape guides see our hub article which covers scraping of Zillow, Realtor.com, Redfin.com, Idealista and other popular platforms.

How to Scrape Real Estate Property Data using Python

Available Data Fields

We can scrape data from Zoopla for several popular real estate data fields and targets:

  • Properties for sale
  • Properties for rent
  • Real estate agent info

In this guide, we'll cover focus on scraping real estate property (rent and sale) data for popular data fields like:

  • Prices
  • Photos
  • Agent contact details
  • Property features
  • Proprety meta data and performance

For more, see the example scraper dataset for all fields we'll be scraping in this guide:

Example Scraper Output
{
"id": 63422316,
"title": "3 bed maisonette for sale",
"description": "Internal:<br><br>Entrance - Access is made via its own rear entrance door from the private outdoor terrace, opening to the first floor hall.<br><br>Hall - With solid wood flooring, the carpeted staircase leading up to the second floor landing and doors to the lounge, the kitchen, bedroom three and the bathroom.<br><br>Lounge - Offering generous space for furniture with a double glazed window with wooden shutters, solid wood flooring and a feature period fireplace with a decorative surround, mantelpiece and hearth.<br><br>Kitchen/Diner - Fitted with a range of modern black wall and base units with complementing wood worktops over, two double glazed windows and tiled flooring and splashbacks. Inset one and a half stainless steel sink basin with a drainer and mixer tap and an integrated fridge-freezer and an electric oven with a countertop gas hob and overhead extractor hood, with space for further appliances and for a dining table and chairs.<br><br>Bedroom Three - Can be used as a double size bedroom or as an additional reception room or study, with a double glazed window with wooden shutters, solid wood flooring and a closed corner fireplace.<br><br>Bathroom - Modern white suite comprising a push-button WC, a wash hand basin and a panelled bath with an overhead shower and glass screen. Obscure double glazed window, tiled flooring and partly tiled walls.<br><br>Second Floor Landing - With a Velux window, carpeted flooring, a storage cupboard, eaves storage and doors to bedrooms one and two.<br><br>Bedroom One - Large bedroom providing ample space for furniture, with a double glazed window, carpeted flooring, a closed fireplace and a storage cupboard.<br><br>Bedroom Two - Double size bedroom with a double glazed window with wooden shutters, carpeted flooring and a storage cupboard.<br><br>External:<br><br>The property benefits from a spacious and low-maintenance private terrace to the rear first floor.<br><br>Additional information:<br><br>Council Tax Band: B<br><br>Local Authority: Sutton<br><br>Lease: 125 years from 25 March 1988<br><br>Annual Ground Rent: £50 per<br><br>This information is to be confirmed by the vendor's solicitor.<br><br>Early viewing is highly recommended due to the property being realistically priced.<br><br>Disclaimer:<br><br>These particulars, whilst believed to be accurate are set out as a general guideline and do not constitute any part of an offer or contract. Intending Purchasers should not rely on them as statements of representation of fact, but must satisfy themselves by inspection or otherwise as to their accuracy. Please note that we have not tested any apparatus, equipment, fixtures, fittings or services including gas central heating and so cannot verify they are in working order or fit for their purpose. Furthermore, Solicitors should confirm moveable items described in the sales particulars and, in fact, included in the sale since circumstances do change during the marketing or negotiations. Although we try to ensure accuracy, if measurements are used in this listing, they may be approximate. Therefore if intending Purchasers need accurate measurements to order carpeting or to ensure existing furniture will fit, they should take such measurements themselves. Photographs are reproduced general information and it must not be inferred that any item is included for sale with the property.<br><strong>Tenure<br></strong><br><br>To be confirmed by the Vendor’s Solicitors<br><strong>Possession<br></strong><br><br>Vacant possession upon completion<br><strong>Viewing<br></strong><br><br>Viewing strictly by appointment through The Express Estate Agency",
"url": "/for-sale/details/63422316/",
"price": "£220,000",
"type": "maisonette",
"date": "2022-12-08T02:28:04",
"category": "residential",
"section": "for-sale",
"features": [
  "*Guide Price £220,000 - £240,000*",
  "**cash buyers only**",
  "Duplex Maisonette With Private Entrance",
  "Vacant with No Onward Chain",
  "Three Double Size Bedrooms",
  "Spacious Lounge with Feature Fireplace",
  "Modern Kitchen/Diner with Appliances",
  "Modern Bathroom Suite",
  "Private Rear Terrace &amp; Entrance Door",
  "Ideal Rental Investment"
],
"floor_plan": { "filename": null, "caption": null },
"nearby": [
  { "title": "Malmesbury Primary School", "distance": 0.3 },
  "etc. (reduced for blog)"
],
"coordinates": {
  "lat": 51.383345,
  "lng": -0.190118
},
"photos": [
  {
    "filename": "5f2cbcd9866478e716b32aa9af78e59a2c3645ce.jpg",
    "caption": null
  },
  "etc. (reduced for blog)"
],
"details": {
  "__typename": "ListingAnalyticsTaxonomy",
  "location": "Sutton",
  "regionName": "London",
  "section": "for-sale",
  "acorn": 36,
  "acornType": 36,
  "areaName": "Sutton",
  "bedsMax": 3,
  "bedsMin": 3,
  "branchId": 15566,
  "branchLogoUrl": "https://st.zoocdn.com/zoopla_static_agent_logo_(654815).png",
  "branchName": "Express Estate Agency",
  "brandName": "Express Estate Agency",
  "chainFree": true,
  "companyId": 18887,
  "countryCode": "gb",
  "countyAreaName": "London",
  "currencyCode": "GBP",
  "displayAddress": "Rosehill, Sutton SM1",
  "furnishedState": "",
  "groupId": null,
  "hasEpc": true,
  "hasFloorplan": true,
  "incode": "3HE",
  "isRetirementHome": false,
  "isSharedOwnership": false,
  "listingCondition": "pre-owned",
  "listingId": 63422316,
  "listingsCategory": "residential",
  "listingStatus": "for_sale",
  "memberType": "agent",
  "numBaths": 1,
  "numBeds": 3,
  "numImages": 14,
  "numRecepts": 1,
  "outcode": "SM1",
  "postalArea": "SM",
  "postTownName": "Sutton",
  "priceActual": 220000,
  "price": 220000,
  "priceMax": 220000,
  "priceMin": 220000,
  "priceQualifier": "guide_price",
  "propertyHighlight": "",
  "propertyType": "maisonette",
  "sizeSqFeet": "",
  "tenure": "leasehold",
  "zindex": 255069
},
"agency": {
  "__typename": "AgentBranch",
  "branchDetailsUri": "/find-agents/branch/express-estate-agency-manchester-15566/",
  "branchId": "15566",
  "branchResultsUri": "/for-sale/branch/express-estate-agency-manchester-15566/",
  "logoUrl": "https://st.zoocdn.com/zoopla_static_agent_logo_(654815).png",
  "phone": "0333 016 5458",
  "name": "Express Estate Agency",
  "memberType": "agent",
  "address": "St George's House, 56 Peter Street, Manchester",
  "postcode": "M2 3NQ"
}
}

Setup

In this tutorial, we'll be using Python with two community packages:

  • httpx - HTTP client library which will let us communicate with Zoopla.com's servers
  • parsel - HTML parsing library which will help us to parse our web scraped HTML data using CSS selectors or Xpath.
  • jmespath - JSON parsing library. Allows writing XPath like rules for JSON.

These packages can be easily installed via the pip install command:

$ pip install httpx parsel jmespath

Alternatively, feel free to swap httpx out with any other HTTP client package such as requests as we'll only need basic HTTP functions which are almost interchangeable in every library. As for, parsel, another great alternative is the beautifulsoup package.

Scraping Property Data

Let's start by taking a look at how to scrape property data from a single listing page.

Zoopla is using Next.js for rendering its pages and it generates hidden web data for each property page. Instead of parsing HTML using traditional tools like beautifulsoup we'll extract this hidden JSON data and parse it using jmespath.

How to Scrape Hidden Web Data

Introduction to scraping hidden web data - what is it and best ways to parse it in Python

How to Scrape Hidden Web Data

Let's start by picking a random property listing as our test target. To find the hidden data let's take a look at the page source:

Page source illustration of Zoopla's property page

Page source highlight of a Zoopla's property page

We can see that a big real estate dataset is located in a <script id="__NEXT_DATA__"> element. Let's scrape it:

import asyncio
import json
from typing import List, Optional, TypedDict

import jmespath
from httpx import AsyncClient, Response
from parsel import Selector

session = AsyncClient(
    headers={
        # use same headers as a popular web browser (Chrome on Windows in this case)
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Accept-Language": "en-US,en;q=0.9",
    }
)

def extract_next_data(response: Response) -> dict:
    selector = Selector(text=response.text)
    data = selector.css("script#__NEXT_DATA__::text").get()
    if not data:
        print(f"page {response.url} is not a property listing page")
        return
    data = json.loads(data)
    return data["props"]["pageProps"]


async def scrape_properties(urls: List[str]):
    to_scrape = [session.get(url) for url in urls]
    properties = []
    for response in asyncio.as_completed(to_scrape):
        properties.append(extract_next_data(await response)["listingDetails"])
    return properties
Run Code

if __name__ == "__main__":
    urls = [
        "https://www.zoopla.co.uk/for-sale/details/63422872/",
        "https://www.zoopla.co.uk/for-sale/details/63422422/",
        "https://www.zoopla.co.uk/for-sale/details/63422316/",
        "https://www.zoopla.co.uk/for-sale/details/63422320/",
        "https://www.zoopla.co.uk/for-sale/details/63422282/",
        "https://www.zoopla.co.uk/for-sale/details/63422152/",
        "https://www.zoopla.co.uk/for-sale/details/63422228/",
        "https://www.zoopla.co.uk/for-sale/details/63422243/",
        "https://www.zoopla.co.uk/for-sale/details/63422274/",
        "https://www.zoopla.co.uk/for-sale/details/63422422/",
    ]
    asyncio.run(scrape_properties(urls))

Above, we wrote a small web scraper for Zoopla properties. Let's take a look at the key points of what we're doing here.

First, we establish a httpx session with browser-like default headers to avoid blocking.
Then, to extract hidden web data we load the HTML as a parsel.Selector object and use a CSS selector to select a <script> element with id=__NEXT_DATA__.
This script contains property data in JSON format so we load it up as a Python dictionary and return the results.

This basic Python Zoopla scraper gets us the whole available property dataset but it's full of useless data fields. Let's clean it up by parsing out the most important fields using JMESPath.

Parsing JSON

Let's reduce this dataset to the most important fields using JMESPath query language that allows us to write query paths for JSON like XPath or CSS selectors for HTML.

Quick Intro to Parsing JSON with JMESPath in Python

If you haven't heard of JMESPath before, take a look at our quick introduction article which covers JMESPath use with Python

Quick Intro to Parsing JSON with JMESPath in Python

Let's update our scraper with property parsing:

import asyncio
import json
from typing import List, Optional, TypedDict, Dict

import jmespath
from httpx import AsyncClient, Response
from parsel import Selector

session = AsyncClient(
    headers={
        # use same headers as a popular web browser (Chrome on Windows in this case)
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Accept-Language": "en-US,en;q=0.9",
    }
)


class PropertyResult(TypedDict):
    """Type hint for scraped property data just so we can visualize it better"""
    listing_id: str
    title: str
    description: str
    url: str
    price: str
    photos: List[dict]
    agency: Dict[str, str]
    features: List[str]
    ...  # and much more


def parse_property(response: Response) -> Optional[PropertyResult]:
    data = _extract_next_data(response)
    if not data:
        return
    result = jmespath.search(
        """root.{
        id: listingId,
        title: title,
        description: detailedDescription,
        url: listingUris.detail,
        price: pricing.label,
        type: propertyType,
        date: publishedOn,
        category: category,
        section: section,
        features: features.bullets,
        floor_plan: floorPlan.image.{filename:filename, caption: caption}, 
        nearby: pointsOfInterest[].{title: title, distance: distanceMiles},
        coordinates: location.coordinates.{lat:latitude, lng: longitude},
        photos: propertyImage[].{filename: filename, caption: caption},
        details: analyticsTaxonomy,
        agency: branch
    }""", {"root": data["listingDetails"]})
    return result


def _extract_next_data(response: Response) -> dict:
    """find NEXT_DATA in given response HTML body"""
    selector = Selector(text=response.text)
    data = selector.css("script#__NEXT_DATA__::text").get()
    if not data:
        print(f"page {response.url} is not a property listing page")
        return
    data = json.loads(data)
    return data["props"]["pageProps"]

async def scrape_properties(urls: List[str]):
    """Scrape Zooplas property pages for property data"""
    to_scrape = [session.get(url) for url in urls]
    properties = []
    for response in asyncio.as_completed(to_scrape):
        properties.append(parse_property(await response))
    return properties
Example Output
{
  "id": "63422152",
  "title": "3 bed flat for sale",
  "description": "Featuring a fabulous private roof terrace and secure underground parking, this superb 3 bedroom apartment provides stylish lateral living space within a popular riverside development.<br><br>The River Thames is moments away and lots of on-site amenities can be found throughout the development, including convenience store, cafe and dentist. The shops, cafes and restaurants of Putney town centre and Wandsworth Town are all within easy reach.<br><br>Please use the reference CHPK4316397 when contacting Foxtons.",
  "url": "/for-sale/details/63422152/",
  "price": "£900,000",
  "type": "flat",
  "date": "2022-12-08T01:15:06",
  "category": "residential",
  "section": "for-sale",
  "features": [
    "Secure entrance and lift to 4th floor",
    "Sun-filled reception room with door to terrace",
    "Sleek open-plan kitchen with integrated appliances",
    "Main bedroom with private balcony and en suite",
    "2 additional good-sized bedrooms",
    "Smart main bathroom",
    "Private roof terrace with space for dining, relaxing and storage",
    "Extra wide underground parking space"
  ],
  "floor_plan": {
    "filename": null,
    "caption": null
  },
  "nearby": [
    {
      "title": "Wandsworth Riverside Quarter Pier",
      "distance": 0.1
    },
    {
      "title": "The Roche School",
      "distance": 0.2
    },
    {
      "title": "St Joseph's RC Primary School",
      "distance": 0.2
    },
    {
      "title": "Wandsworth Town",
      "distance": 0.4
    }
  ],
  "coordinates": {
    "lat": 51.461368,
    "lng": -0.197922
  },
  "photos": [
    {
      "filename": "8b8f71df67b294ad3d114708603af4146096b7dc.jpg",
      "caption": null
    },
    {
      "filename": "002e46a10979e1a6b9c691f2a7ba221ba6aca215.jpg",
      "caption": null
    },
    {
      "filename": "3c2aef9eafa4e81aca16c814fff9fdbbb6d7ea14.jpg",
      "caption": null
    },
    {
      "filename": "ffa96fada68131ee28f2ce8d6f127303af60b136.jpg",
      "caption": null
    },
    {
      "filename": "e8ec10ce0aad02214f9bdda1290459e4e461bcc5.jpg",
      "caption": null
    },
    {
      "filename": "cf1e876b476195faab1e29cca417d7fd8fa3cd49.jpg",
      "caption": null
    },
    {
      "filename": "7bbb8e63ff8483ae3af0709d8d40de8b3bfeda26.jpg",
      "caption": null
    },
    {
      "filename": "4438e37536d2e82ef430cdf21f16ab4da809d0bc.jpg",
      "caption": null
    },
    {
      "filename": "6bf71a57c2400c2c4375894dad7c5361afa5dd98.jpg",
      "caption": null
    },
    {
      "filename": "13762722024be732299a94c94929ed8025e77bb1.jpg",
      "caption": null
    },
    {
      "filename": "06e89441cfd613b9fd186ceeb4b2b9ab320e4295.jpg",
      "caption": null
    },
    {
      "filename": "5a3a7393c90974e02a3cbfaa1809814685d26bb8.jpg",
      "caption": null
    },
    {
      "filename": "6c04fb22d82715221f6dfc53ff682d4b361cd246.jpg",
      "caption": null
    },
    {
      "filename": "1cf7e428cf5b2ac9f12b8ec09886e4804c85c584.jpg",
      "caption": null
    },
    {
      "filename": "68ebc853bce8112c41df4dc6b92b042cd682f9e1.jpg",
      "caption": null
    },
    {
      "filename": "5218cd282da9ea0213d78d639ef07fca68b7aba1.jpg",
      "caption": null
    }
  ],
  "details": {
    "__typename": "ListingAnalyticsTaxonomy",
    "location": "London",
    "regionName": "London",
    "section": "for-sale",
    "acorn": 15,
    "acornType": 15,
    "areaName": "London",
    "bedsMax": 3,
    "bedsMin": 3,
    "branchId": 2888,
    "branchLogoUrl": "https://st.zoocdn.com/zoopla_static_agent_logo_(592983).png",
    "branchName": "Foxtons - Putney",
    "brandName": "Foxtons",
    "chainFree": false,
    "companyId": 1370,
    "countryCode": "gb",
    "countyAreaName": "London",
    "currencyCode": "GBP",
    "displayAddress": "Knightley Walk, Wandsworth, London SW18",
    "furnishedState": "",
    "groupId": 267,
    "hasEpc": false,
    "hasFloorplan": true,
    "incode": "1HB",
    "isRetirementHome": false,
    "isSharedOwnership": false,
    "listingCondition": "pre-owned",
    "listingId": 63422152,
    "listingsCategory": "residential",
    "listingStatus": "for_sale",
    "memberType": "agent",
    "numBaths": 2,
    "numBeds": 3,
    "numImages": 16,
    "numRecepts": 1,
    "outcode": "SW18",
    "postalArea": "SW",
    "postTownName": "London",
    "priceActual": 900000,
    "price": 900000,
    "priceMax": 900000,
    "priceMin": 900000,
    "priceQualifier": "",
    "propertyHighlight": "",
    "propertyType": "flat",
    "sizeSqFeet": "1022",
    "tenure": null,
    "zindex": 628988
  },
  "agency": {
    "__typename": "AgentBranch",
    "branchDetailsUri": "/find-agents/branch/foxtons-putney-london-2888/",
    "branchId": "2888",
    "branchResultsUri": "/for-sale/branch/foxtons-putney-london-2888/",
    "logoUrl": "https://st.zoocdn.com/zoopla_static_agent_logo_(592983).png",
    "phone": "020 3542 2189",
    "name": "Foxtons - Putney",
    "memberType": "agent",
    "address": "175 Putney High Street, London",
    "postcode": "SW15 1RT"
  }
}

Here, we've updated our Python Zoopla web scraper with JSON parsing logic by defining parsing path using JMESPath.

Finding Properties

To find property listings on Zoopla we have two options: scrape sitemaps to find all listed properties or use Zoopla's search system to scrape listings by location.

To scrape Zoopla's search first let's take a look at how it works. If we submit a search request like "Islington, London" we can see that we are being redirected to an URL which contains the search results:

https://www.zoopla.co.uk/for-sale/property/london/islington/?q=Islington%2C%20London&search_source=home

Though, how do we create this URL from a search query? Let's take a look at what the web page does when we submit our search:

0:00
/
demonstration on how to use Chrome developer tools to find Zoopla's search API

We can see that Zoopla is using background API to redirect us to search page from query:

https://www.zoopla.co.uk/search/?view_type=list&section=for-sale&q=Islington%2C%20London&geo_autocomplete_identifier=&search_source=home

Let's replicate this in our scraper

from typing import Literal

...

async def find_properties(query: str, query_type: Literal["for-sale", "to-rent"] = "for-sale"):
    # scrape first results page to start:
    first_page = await session.get(
        url=f"https://www.zoopla.co.uk/search/?view_type=list&section={query_type}&q={query}&geo_autocomplete_identifier=&search_source=home&sort=newest_listings",
        follow_redirects=True,
    )
    # extract next.js data and the listings of the first page
    data = extract_next_data(first_page)["initialProps"]["searchResults"]
    listings = data["listings"]["regular"]
    # then extract total pages
    total_results = data["pagination"]["totalResults"]
    total_pages = math.ceil(data["pagination"]["totalResults"] / len(listings))
    if total_pages > data["pagination"]["pageNumberMax"]:
        total_pages = data["pagination"]["pageNumberMax"]

    # scrape reamining pages concurrently:
    print(f"total {total_results} results, {total_pages} pages")
    other_pages = [session.get(url=str(first_page.url) + f"&pn={page}") for page in range(2, total_pages + 1)]
    for response in asyncio.as_completed(other_pages):
        data = extract_next_data(await response)["initialProps"]["searchResults"]
        listings.extend(data["listings"]["regular"])
    return listings
Run Code
if __name__ == "__main__":
    results = asyncio.run(find_properties(""))
    print(results)

To explain our Zoopla crawler above - we start by sending our query to Zoopla's search API which redirects us to the first results page. Then, we are using the same parsing technique we used in property page parsing and extracting hidden web data. This data contains the total result count which we later use to scrape other pages of the results pagination.

We could further enrich property result by scraping each property page with the property scraper we wrote earlier.

Next, let's take a look at a different discovery approach which can be used to find all properties on Zoopla - the sitemaps.

Scraping Sitemaps

Sitemaps are collections of files that contain URLs to various web pages - be it property listings, blog articles or individual pages.

For our Zoopla scraper in Python, to find all properties using the sitemap collection, we first must find the sitemap itself. For that, we can check the /robots.txt endpoint which contains various instructions for web scrapers:

Sitemap: https://www.zoopla.co.uk/xmlsitemap/sitemap/index.xml.gz

This central sitemap acts as a hub for all other sitemaps that are categorized by a topic:

<sitemap>
  <loc>https://www.zoopla.co.uk/xmlsitemap/sitemap/for_sale_details_001.xml.gz</loc>
  <lastmod>2022-12-08T09:25:08+00:00</lastmod>
</sitemap>
<sitemap>
  <loc>https://www.zoopla.co.uk/xmlsitemap/sitemap/to_rent_details_001.xml.gz</loc>
  <lastmod>2022-12-08T09:25:08+00:00</lastmod>
</sitemap>
<sitemap>
  <loc>https://www.zoopla.co.uk/xmlsitemap/sitemap/for_sale_flats_001.xml.gz</loc>
  <lastmod>2022-12-08T09:25:08+00:00</lastmod>
</sitemap>
...

🧙‍♂️ each sitemap is limited to 50_000 URLs - that's why they are split into several parts.

For example, to scrape all properties for rent we could find all URLs by scraping the to_rent_ sitemaps.

Let's take a quick look at how to scrape sitemap files using Python:

import asyncio
from httpx import AsyncClient
from parsel import Selector

session = AsyncClient()

async def scrape_feed(url):
    resp = await session.get(url)
    selector = Selector(text=resp.text)
    results = []
    for url in selector.xpath("//loc/text()").getall():
        results.append(url)
    return results

# example run:
asyncio.run(
    scrape_feed("https://www.zoopla.co.uk/xmlsitemap/sitemap/to_rent_details_001.xml.gz")
)

🧙‍♂️ sometimes sitemap files can be gzip encoded. Use gzip.decode() function to decode the contents before passing them to the Selector.

Since sitemaps are XML files we can parse them with the same tools we use to parse HTML. In the example above we retrieve the sitemap page and extract URLs using parsel and XPath selectors.

Tracking New Listings

We know how to find property listings, so now we can also track Zoopla for new property listings either by scraping the search or sitemaps.

If we want to keep our whole listing dataset up to date we can track the new_home_details_x sitemaps found in the Zooplas we covered earlier.

Though, these sitemaps are being updated only once per day - what if we want to know about new listings ASAP? For that, we can scrape search queries and sort them by "Most Recent" which is exactly how we coded our search scraper to behave.

Avoiding Blocking with ScrapFly

Web scraping Zoopla is very straight forward, however when scaling up our scraper beyond a few property scrapes we might start to run into scraper blocking and captchas.

To scale up, let's take advantage of ScrapFly API which offers several powerful features that can help us to scale up our web scrapers and avoid Zoopla's blocking:

For this, we'll be using the scrapfly-sdk python package and the Anti Scraping Protection Bypass feature. To start, let's install scrapfly-sdk using pip:

$ pip install scrapfly-sdk

To take advantage of ScrapFly's API in our Zoopla web scraper all we need to do is change our httpx session code with scrapfly-sdk client requests:

import httpx

response = httpx.get("some redfin.com url")
# in ScrapFly SDK becomes
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient("YOUR SCRAPFLY KEY")
result = client.scrape(ScrapeConfig(
    # some zoopla URL
    "https://www.zoopla.co.uk/for-sale/details/63412743/",
    # we can select specific proxy country
    country="GB",
    # and enable anti scraping protection bypass:
    asp=True,
))

For more on how to scrape Zoopla.com using ScrapFly, see the Full Scraper Code section.

FAQ

To wrap this guide up, let's take a look at some frequently asked questions regarding how to scrape data from Zoopla:

Yes. Zoopla's data is publically available - scraping Zoopla at slow, respectful rates would fall under the ethical scraping definition.
That being said, be aware of GDRP compliance in the EU when storing personal data such as agent's personal details like names, phone numbers. For more, see our Is Web Scraping Legal? article.

Is there a Zoopla API?

Yes, though it's private and is limited to a specific set of data fields (e.g. contains no agent contact details). Fortunately, as covered in this article, we can scrape Zoopla using Python.

How to crawl Zoopla.com?

To web crawl Zoopla we can adapt the scraping techniques covered in this article. Particularly, the recommended/similar properties data field can be used to develop crawling logic. That being said, with Zoopla's extensive sitemap system crawling is unnecessary as we can scrape all properties directly.

Summary

In this guide, we wrote a Zoopla scraper for real estate property data using nothing but Python with a few community packages: httpx, parsel and jmespath.

To scrape property data we used parsel to extract data hidden in an HTML script element. We then cleaned it up and parsed the most important fields using JMESPath parsing language.

To find properties to scrape we also explored how to scrape Zoopla's sitemap and search systems. We've also covered how search scraping can be used to track when new properties are being listed.

Finally, to avoid being blocked we used ScrapFly's API which smartly configures every web scraper connection to avoid being blocked. For more about ScrapFly, see our documentation and try it out for FREE!

Full Scraper Code

Here's the full Zoopla web scraper code with ScrapFly integration:

💙 This code should only be used as a reference. To scrape Zoopla's real estate data at scale we recommend expanding this scraper with logging, failure management and other production details.

import asyncio
import json
import math
from typing import List, Literal, Optional, TypedDict

import jmespath
from scrapfly import ScrapeConfig, ScrapeApiResponse, ScrapflyClient

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY", max_concurrency=2)


class PropertyResult(TypedDict):
    listing_id: str
    title: str
    description: str
    url: str
    price: str
    photos: List[dict]
    ...  # and much more


def parse_property(result: ScrapeApiResponse) -> Optional[PropertyResult]:
    data = extract_next_data(result)
    if not data:
        return
    result = jmespath.search(
        """listingDetails.{
        id: listingId,
        title: title,
        description: detailedDescription,
        url: listingUris.detail,
        price: pricing.label,
        type: propertyType,
        date: publishedOn,
        category: category,
        section: section,
        features: features.bullets,
        floor_plan: floorPlan.image.{filename:filename, caption: caption}, 
        nearby: pointsOfInterest[].{title: title, distance: distanceMiles},
        coordinates: location.coordinates.{lat:latitude, lng: longitude},
        photos: propertyImage[].{filename: filename, caption: caption},
        details: analyticsTaxonomy,
        agency: branch
    }""", data)
    return result


def extract_next_data(result: ScrapeApiResponse) -> dict:
    """find hidden next.js data in page scrape result"""
    data = result.selector.css("script#__NEXT_DATA__::text").get()
    if not data:
        print(f"page {result.context['url']} is not a property listing page")
        return
    data = json.loads(data)
    return data["props"]["pageProps"]


async def scrape_properties(urls: List[str]) -> List[PropertyResult]:
    """scrape Zoopla property pages and parse results"""
    to_scrape = [ScrapeConfig(url=url, asp=True, country="GB") for url in urls]
    properties = []
    async for result in scrapfly.concurrent_scrape(to_scrape):
        properties.append(parse_property(result))
    return properties


async def find_properties(query: str, query_type: Literal["for-sale", "to-rent"] = "for-sale"):
    """scrape Zooplas search system and find all search results"""
    # scrape first results page to start:
    first_page = await scrapfly.async_scrape(
        ScrapeConfig(
            url=f"https://www.zoopla.co.uk/search/?view_type=list&section={query_type}&q={query}&geo_autocomplete_identifier=&search_source=home",
            country="GB",
            asp=True,
        )
    )
    # extract next.js data and the listings of the first page
    data = extract_next_data(first_page)["initialProps"]["searchResults"]
    listings = data["listings"]["regular"]
    # then extract total pages
    total_results = data["pagination"]["totalResults"]
    total_pages = math.ceil(data["pagination"]["totalResults"] / len(listings))
    if total_pages > data["pagination"]["pageNumberMax"]:
        total_pages = data["pagination"]["pageNumberMax"]

    # scrape reamining pages concurrently:
    print(f"total {total_results} results, {total_pages} pages")
    other_pages = [ScrapeConfig(url=first_page.context["url"] + f"&pn={page}") for page in range(2, total_pages + 1)]
    async for result in scrapfly.concurrent_scrape(other_pages):
        data = extract_next_data(result)["initialProps"]["searchResults"]
        listings.extend(data["listings"]["regular"])
    return listings

Related Posts

How to Scrape RightMove Real Estate Property Data with Python

In this scrape guide we'll be taking a look at scraping RightMove.co.uk - one of the most popular real estate listing websites in the United Kingdom. We'll be scraping hidden web data and backend APIs directly using Python.

How to Scrape Google Search with Python

In this scrape guide we'll be taking a look at how to scrape Google Search - the biggest index of public web. We'll cover dynamic HTML parsing and SERP collection itself.

How to Scrape Ebay using Python

In this scrape guide we'll be taking a look at Ebay.com - the biggest peer-to-peer e-commerce portal in the world. We'll be scraping product details and product search.