How to Scrape Realestate.com.au Property Listing Data

by Mazen Ramadan Sep 19, 2024

#python #real-estate #scrapeguide

How to Scrape Realestate.com.au Property Listing Data

When it comes it comes to real estate websites in Australia, there are a few options and Realestate.com.au is biggest one. It's a popular website for real estate ads featuring thousands of different property listings across the country. However, it's a highly protected website, making it challenging to scrape.

In this article, we'll explain how to scrape realestate.com.au for real estate data from property and search pages. We'll also explain how to avoid realestate.com.au web scraping blocking. Let's dive in!

Latest Realestate.com.au Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Legal Disclaimer and Precautions

This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:

Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow in web scraping and for more you should consult a lawyer.

Why Scrape Realestate.com.au?

Realestate.com.au includes thousands of property listing pages and manually navigating these pages can be a tedious and time-consuming task. Realestate.com.au scraping makes it easy to search and navigate through a significant amount of property listings in no time.

Web scraping realestate.com.au enables businesses, traders and buyers to analyze and study market trends, allowing for better market understanding and gaining a competitive edge, where they can make better decisions and take wise investment actions.

For more details, refer to our previous article on real estate web scraping use cases.

Project Setup

To scrape realestate.com.au, we'll use a few Python packages:

httpx: An HTTP client for sending requests to the website.
parsel: A parsing module for extracting data from the HTML using XPath and CSS selectors.
JMESPath: A module for parsing and refining JSON datasets.
asyncio: A module allows for running our web scraping code asynchronously.
scrapfly-sdk: Python SDK for ScrapFly, a web scraping API that allows for scraping at scale without getting blocked.

Since asyncio comes pre-installed in Python, you will only have to install the other libraries using the following pip command:

pip install httpx parsel jmespath scrapfly-sdk

How to Scrape Realestate.com.au Propety Pages

Let's begin by scraping property pages on realestate.com.au. Go to any property listing on the website like this property listing and you will get a page similar to this:

scrapfly middleware — Property listing page on realestate.com.au

Instead of parsing this page's data using selectors, we'll use the hidden web data method.

How to Scrape Hidden Web Data

The visible HTML doesn't always represent the whole dataset available on the page. In this article, we'll be taking a look at scraping of hidden web data. What is it and how can we scrape it using Python?

To view this data, open the browser developer tools by clicking the F12 key to view the page HTML and scroll down to the script tag that starts with the window.ArgonautExchange text. You will see messy JSON data that looks like this after parsing:

hidden property data on developer tools — Property pages hidden web data

To scrape realtor.com.au property pages, we'll select this script and parse the inside JSON data:

Python

ScrapFly

import re
import json
import asyncio
import jmespath
from httpx import AsyncClient, Response
from parsel import Selector
from typing import List, Dict

client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser headers to mimize blocking chancesd
    headers={
        "accept-language": "en-US,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "accept-language": "en-US;en;q=0.9",
        "accept-encoding": "gzip, deflate, br",
    }
)

    
def parse_property_data(data: Dict) -> Dict:
    """refine property data from JSON"""
    if not data:
        return
    result = jmespath.search(
        """{
        id: id,
        propertyType: propertyType.display,
        description: description,            
        propertyLink: _links.canonical.href,
        address: address,
        propertySizes: propertySizes,
        generalFeatures: generalFeatures,
        propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
        images: media.images[].templatedUrl,
        videos: videos,
        floorplans: floorplans,        
        listingCompany: listingCompany.{name: name, id: id, companyLink: _links.canonical.href, phoneNumber: businessPhone, address: address.display.fullAddress, ratingsReviews: ratingsReviews, description: description},
        listers: listers,
        auction: auction
        }
        """,
        data,
    )
    return result

    
def parse_hidden_data(response: Response) -> Dict:
    """parse JSON data from script tag"""
    selector = Selector(response.text)
    script = selector.xpath(
        "//script[contains(text(),'window.ArgonautExchange')]/text()"
    ).get()
    # data needs to be parsed mutiple times
    data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
    data = json.loads(data["resi-property_listing-experience-web"]["urqlClientCache"])
    data = json.loads(list(data.values())[0]["data"])
    return data

    
async def scrape_properties(urls: List[str]) -> List[Dict]:
    """scrape listing data from property pages"""
    # add the property pages URLs to a scraping list
    to_scrape = [client.get(url) for url in urls]
    properties = []
    # scrape all the property pages concurrently
    for response in asyncio.as_completed(to_scrape):
        response = await response
        assert response.status_code == 200, "request has been blocked"
        data = parse_hidden_data(response)["details"]["listing"]
        data = parse_property_data(data)
        properties.append(data)
    print(f"scraped {len(properties)} property listings")
    return properties

import re
import json
import jmespath
from typing import Dict, List
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")


def parse_property_data(data: Dict) -> Dict:
    """refine property data from JSON"""
    if not data:
        return
    result = jmespath.search(
        """{
        id: id,
        propertyType: propertyType.display,
        description: description,            
        propertyLink: _links.canonical.href,
        address: address,
        propertySizes: propertySizes,
        generalFeatures: generalFeatures,
        propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
        images: media.images[].templatedUrl,
        videos: videos,
        floorplans: floorplans,        
        listingCompany: listingCompany.{name: name, id: id, companyLink: _links.canonical.href, phoneNumber: businessPhone, address: address.display.fullAddress, ratingsReviews: ratingsReviews, description: description},
        listers: listers,
        auction: auction
        }
        """,
        data,
    )
    return result


def parse_hidden_data(response: ScrapeApiResponse) -> Dict:
    """parse JSON data from script tag"""
    selector = response.selector
    script = selector.xpath(
        "//script[contains(text(),'window.ArgonautExchange')]/text()"
    ).get()
    # data needs to be parsed mutiple times
    data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
    data = json.loads(data["resi-property_listing-experience-web"]["urqlClientCache"])
    data = json.loads(list(data.values())[0]["data"])
    return data


async def scrape_properties(urls: List[str]) -> List[Dict]:
    """scrape listing data from property pages"""
    # add the property pages URLs to a scraping list
    to_scrape = [ScrapeConfig(url, country="AU", asp=True) for url in urls]
    properties = []
    # scrape all the property pages concurrently
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        data = parse_hidden_data(response)["details"]["listing"]
        data = parse_property_data(data)
        properties.append(data)
    print(f"scraped {len(properties)} property listings")
    return properties

Run the code

async def run():
    data = await scrape_properties(
        urls = [
            "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
            "https://www.realestate.com.au/property-house-vic-bundoora-141557712",
            "https://www.realestate.com.au/property-townhouse-vic-glenroy-143556608",
        ]
    )
    # print the data in JSON format
    print(json.dumps(data, indent=2))

if __name__ == "__main__":
    asyncio.run(run())

🙋‍ If you are getting errors while running the Python code tabs, this is likely due to getting blocked. Run the ScrapFly code tabs to avoid getting blocked.

In the above code, we use three functions, let's break them down:

parse_hidden_data() for extracting the JSON data from the script tag and parsing it as a valid JSON object.
parse_property_data() for refining the JSON data we got and excluding the unnecessary details.
scrape_properties() for scraping the property pages by adding the page URLs into a scraping list and scraping them concurrently.

Here is a sample output of the result we got:

Sample output

[
  {
    "id": "143160680",
    "propertyType": "House",
    "description": "Renowned Real Estate proudly presents this sensational opportunity with a luxury house in Tarneit.<br/><br/>This beautiful low maintenance home is situated in the well-established suburb of Tarneit.<br/>Suitable for Various Buyers: This property is ideal for young families, downsizers, and investors.<br/>Convenience: It's conveniently located a short distance from the Tarneit west shopping center, local parks, public transport, and well-known primary and secondary schools, including the Islamic College of Melbourne.<br/><br/>Spacious Layout: The house features four generous-sized bedrooms, 1 lounge, and a dining area.<br/>Master Bedroom: The master bedroom includes an ensuite and a walk-in robe for added convenience.<br/>Contemporary Kitchen: The kitchen is modern and overlooks the low maintenance backyard and formal lounge.<br/>Stainless Steel Appliances: It is equipped with stainless steel appliances and ample storage space.<br/><br/>Additional Features:<br/>➡️Open plan living area.<br/>➡️Designated meals area connected with the kitchen and formal lounge.<br/>➡️Ducted heating .<br/>➡️Split system air conditioning in the formal lounge.<br/>➡️Low maintenance front and backyard.<br/><br/>Contact Information: For more information and to schedule an inspection, please contact Himraj at 0452060566",
    "propertyLink": "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
    "address": {
      "suburb": "Tarneit",
      "state": "Vic",
      "postcode": "3029",
      "display": {
        "shortAddress": "28 Chantelle Parade",
        "__typename": "AddressDisplay",
        "fullAddress": "28 Chantelle Parade, Tarneit, Vic 3029",
        "geocode": {
          "latitude": -37.85273078,
          "longitude": 144.66332821,
          "__typename": "GeocodeDisplay"
        }
      },
      "__typename": "Address"
    },
    "propertySizes": {
      "building": null,
      "land": {
        "displayValue": "336",
        "sizeUnit": {
          "displayValue": "m²",
          "__typename": "PropertySizeUnit"
        },
        "__typename": "PropertySize"
      },
      "preferred": {
        "sizeType": "LAND",
        "size": {
          "displayValue": "336",
          "sizeUnit": {
            "displayValue": "m²",
            "__typename": "PropertySizeUnit"
          },
          "__typename": "PropertySize"
        },
        "__typename": "PreferredPropertySize"
      },
      "__typename": "PropertySizes"
    },
    "generalFeatures": {
      "bedrooms": {
        "value": 4,
        "__typename": "IntValue"
      },
      "bathrooms": {
        "value": 2,
        "__typename": "IntValue"
      },
      "parkingSpaces": {
        "value": 2,
        "__typename": "IntValue"
      },
      "studies": {
        "value": 0,
        "__typename": "IntValue"
      },
      "__typename": "GeneralFeatures"
    },
    "propertyFeatures": [
      {
        "featureName": "Built-in wardrobes",
        "value": null
      },
      {
        "featureName": "Dishwasher",
        "value": null
      },
      {
        "featureName": "Ducted heating",
        "value": null
      },
      {
        "featureName": "Ensuites",
        "value": {
          "__typename": "NumericFeatureValue",
          "displayValue": "1"
        }
      },
      {
        "featureName": "Evaporative cooling",
        "value": null
      },
      {
        "featureName": "Floorboards",
        "value": null
      },
      {
        "featureName": "Fully fenced",
        "value": null
      },
      {
        "featureName": "Garage spaces",
        "value": {
          "__typename": "NumericFeatureValue",
          "displayValue": "2"
        }
      },
      {
        "featureName": "Land size",
        "value": {
          "__typename": "MeasurementFeatureValue",
          "displayValue": "336",
          "sizeUnit": {
            "id": "SQUARE_METRES",
            "displayValue": "m²",
            "__typename": "PropertySizeUnit"
          }
        }
      },
      {
        "featureName": "Living areas",
        "value": {
          "__typename": "NumericFeatureValue",
          "displayValue": "1"
        }
      },
      {
        "featureName": "Remote garage",
        "value": null
      },
      {
        "featureName": "Secure parking",
        "value": null
      },
      {
        "featureName": "Solar panels",
        "value": null
      },
      {
        "featureName": "Toilets",
        "value": {
          "__typename": "NumericFeatureValue",
          "displayValue": "2"
        }
      }
    ],
    "images": [
      "https://i2.au.reastatic.net/{size}/d8d3607342301e4e1b5b4cb84e3fc3d8cf48849a6311dd38e44bf3977fc593d8/image.jpg",
      "https://i2.au.reastatic.net/{size}/7d26afd862a3d1d58501a724c3532493c4fa7cd2bd297b2ab334039fd40e6c9c/image.jpg",
      "https://i2.au.reastatic.net/{size}/cbd580874f3f6aedbf263d77b6de3d0e5e2504925f72502b12838b8228cfdd45/image.jpg",
      "https://i2.au.reastatic.net/{size}/12d8b6d3bb5eb40170647f1b81839156eb8526b4c05392158bdbcc6e362a60af/image.jpg",
      "https://i2.au.reastatic.net/{size}/c4658347028f409f3e694de3c11d8c84644d5ee4229187cc418bccc26c93dfb7/image.jpg",
      "https://i2.au.reastatic.net/{size}/303f8e158603d35ea3c945c5839b437a1548cebec2b7a81eb9bad67593dcc603/image.jpg",
      "https://i2.au.reastatic.net/{size}/520ad964d73b7e386c607fc052741ab5fc3b01a2b7b72dc326e614d09bc2d3a5/image.jpg",
      "https://i2.au.reastatic.net/{size}/2ac18df655fa961410a2e80d239006ba3860732f1a26d0df4b1f5e51486662f2/image.jpg",
      "https://i2.au.reastatic.net/{size}/f53337ce77b54ab95b1a5ea4f679550224defcacdf2344ae8652680382c424cb/image.jpg",
      "https://i2.au.reastatic.net/{size}/5249ce376abccad84d0b4f3ce3254579761b4aaffc0ef09c587cf884e6008efc/image.jpg",
      "https://i2.au.reastatic.net/{size}/a740d6d1e484c3ae3c51b3670f02a967929ad61771383332998271f69050460c/image.jpg",
      "https://i2.au.reastatic.net/{size}/cc1255b415aaee3c4ea82a12aaf653141614dc0297ffe434726a82aeed4b6f75/image.jpg"
    ],
    "videos": null,
    "floorplans": null,
    "listingCompany": {
      "name": "Renowned Real Estate - CRAIGIEBURN",
      "id": "PGCQAA",
      "companyLink": "https://www.realestate.com.au/agency/renowned-real-estate-craigieburn-PGCQAA?cid={cid}",
      "phoneNumber": "0452060566",
      "address": "9 Gauja Street, CRAIGIEBURN, VIC 3064",
      "ratingsReviews": {
        "avgRating": null,
        "totalReviews": 0,
        "__typename": "AgencyRatingsReviews"
      },
      "description": null
    },
    "listers": [
      {
        "id": "3307736",
        "name": "Him Raj Parajuli",
        "photo": {
          "templatedUrl": "https://i2.au.reastatic.net/{size}/03527ad948f2ec46b10b220c44fa1007b0dc0eded8119733c9135b0be21547f8/main.jpg",
          "__typename": "Image"
        },
        "phoneNumber": {
          "display": "0452060566",
          "showDisclaimer": false,
          "__typename": "PhoneNumber"
        },
        "_links": {
          "canonical": {
            "href": "https://www.realestate.com.au/agent/him-raj-parajuli-3307736?cid={cid}",
            "__typename": "AbsoluteLinks"
          },
          "__typename": "ListerLinks"
        },
        "__typename": "Lister",
        "agentId": null,
        "jobTitle": "OIEC/Director",
        "showInMediaViewer": false,
        "listerRatingsReviews": {
          "avgRating": null,
          "totalReviews": 0,
          "__typename": "ListerRatingsReviews"
        }
      },
      {
        "id": "3307760",
        "name": "Aman Pakhrin",
        "photo": {
          "templatedUrl": "https://i2.au.reastatic.net/{size}/6b365a8a0ffa9ec976671759a15d136b796ba44f8b973a105b8aabac7ca857e9/main.jpg",
          "__typename": "Image"
        },
        "phoneNumber": {
          "display": "0450939749",
          "showDisclaimer": false,
          "__typename": "PhoneNumber"
        },
        "_links": {
          "canonical": {
            "href": "https://www.realestate.com.au/agent/aman-pakhrin-3307760?cid={cid}",
            "__typename": "AbsoluteLinks"
          },
          "__typename": "ListerLinks"
        },
        "__typename": "Lister",
        "agentId": null,
        "jobTitle": "Sales Director",
        "showInMediaViewer": false,
        "listerRatingsReviews": {
          "avgRating": null,
          "totalReviews": 0,
          "__typename": "ListerRatingsReviews"
        }
      }
    ],
    "auction": null
  }
]

Our realestate.com.au scraper can successfully scrape property pages. Let's scrape search pages so we can discover properties according to our preferences next!

How to Scrape Realestate.com.au Search Pages

Just like property pages, we can find the search page data as JSON under script tags. To see this data, let's take the same approach we did earlier. Search for any properties on the website, inspect the page HTML using developer tools and scroll down to the script tag with the text window.ArgonautExchange.

After parsing the data inside the script tag, the data should look like this:

search pages hidden web data — Search pages hidden web data

The URL used for the above search page is the following:

https://www.realestate.com.au/buy/in-melbourne+-+northern+region,+vic/list-1

The parameter /list-1 represents the search page number. We'll use it within our scraper to scrape multiple search pages:

Python

ScrapFly

import re
import json
import asyncio
import jmespath
from httpx import AsyncClient, Response
from parsel import Selector
from typing import List, Dict

client = AsyncClient(
    # the remaining client config
)

def parse_property_data(data: Dict) -> Dict:
    """refine property data from JSON"""
    # the rest of the function

def parse_hidden_data(response: Response) -> Dict:
    """parse JSON data from script tag"""
    # the rest of the function

def parse_search_data(data: List[Dict]) -> List[Dict]:
    """refine search data"""
    search_data = []
    data = list(data.values())[0]
    for listing in data["results"]["exact"]["items"]:
        # refine each property listing in the search results
        search_data.append(parse_property_data(listing["listing"]))
    max_search_pages = data["results"]["pagination"]["maxPageNumberAvailable"]
    return {"search_data": search_data, "max_search_pages": max_search_pages}


async def scrape_search(url: str, max_scrape_pages: int = None):
    """scrape property listings from search pages"""
    first_page = await client.get(url)
    assert first_page.status_code == 200, "request has been blocked"
    print(f"scraping search page {url}")
    data = parse_hidden_data(first_page)
    data = parse_search_data(data)
    search_data = data["search_data"]
    # get the number of maximum search pages
    max_search_pages = data["max_search_pages"]
    # scrape all available pages if not max_scrape_pages or max_scrape_pages > max_search_pages
    if max_scrape_pages and max_scrape_pages < max_search_pages:
        max_scrape_pages = max_scrape_pages
    else:
        max_scrape_pages = max_search_pages
    print(f"scraping search pagination, remaining ({max_scrape_pages - 1} more pages)")
    # add the remaining search pages in a scraping list
    other_pages = [client.get(str(first_page.url).split("/list")[0] + f"/list-{page}") for page in max_scrape_pages + 1]
    # scrape the remaining search pages concurrently
    for response in asyncio.as_completed(other_pages):
        response = await response
        assert response.status_code == 200, "request has been blocked"
        data = parse_hidden_data(response)
        search_data.extend(parse_search_data(data)["search_data"])
    print(f"scraped ({len(search_data)}) from {url}")
    return search_data

import re
import json
import jmespath
from typing import Dict, List
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_property_data(data: Dict) -> Dict:
    """refine property data from JSON"""
    # the rest of the function

def parse_hidden_data(response: ScrapeApiResponse) -> Dict:
    """parse JSON data from script tag"""
    # the rest of the function

def parse_search_data(data: List[Dict]) -> List[Dict]:
    """refine search data"""
    search_data = []
    data = list(data.values())[0]
    for listing in data["results"]["exact"]["items"]:
        # refine each property listing in the search results
        search_data.append(parse_property_data(listing["listing"]))
    max_search_pages = data["results"]["pagination"]["maxPageNumberAvailable"]
    return {"search_data": search_data, "max_search_pages": max_search_pages}


async def scrape_search(url: str, max_scrape_pages: int = None):
    """scrape property listings from search pages"""
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(url, country="AU", asp=True))
    print(f"scraping search page {url}")
    data = parse_hidden_data(first_page)
    data = parse_search_data(data)
    search_data = data["search_data"]
    # get the number of maximum search pages
    max_search_pages = data["max_search_pages"]
    # scrape all available pages if not max_scrape_pages or max_scrape_pages > max_search_pages
    if max_scrape_pages and max_scrape_pages < max_search_pages:
        max_scrape_pages = max_scrape_pages
    else:
        max_scrape_pages = max_search_pages
    print(f"scraping search pagination, remaining ({max_scrape_pages - 1} more pages)")
    # add the remaining search pages in a scraping list
    other_pages = [
        ScrapeConfig(
            str(first_page.context["url"]).split("/list")[0] + f"/list-{page}",
            country="AU", asp=True
        )
        for page in range(2, max_scrape_pages + 1)
    ]
    # scrape the remaining search pages concurrently
    async for response in SCRAPFLY.concurrent_scrape(other_pages):
        data = parse_hidden_data(response)
        search_data.extend(parse_search_data(data)["search_data"])
    print(f"scraped ({len(search_data)}) from {url}")
    return search_data

Run the code

async def run():
    data = await scrape_search(
        url="https://www.realestate.com.au/buy/in-melbourne+-+northern+region,+vic/list-1",
        max_scrape_pages=3
    )
    # print the data in JSON format
    print(json.dumps(data, indent=2))

if __name__ == "__main__":
    asyncio.run(run())

This code is almost the same as the previous one, but we added two new functions:

parse_search_data() to refine the search we got using the JMESPath we created earlier.
scrape_search() to crawl over search pages by scraping the first search first then scraping the remaining search pages concurrently.

The result is a list containing property listings found on three search pages, similar to this:

Sample output

[
  {
    "id": "143029712",
    "propertyType": "House",
    "description": "Set in the sought-after Aurora Estate and in a prime location close to all amenities including the newly opened Aurora Village and Edgars Creek Secondary School, Epping plaza, Northern Hospital and easy freeway access, everything you need is just a stone’s throw away!<br/><br/>This spacious home comprises of four generous sized bedrooms all with built in robes (master with walk-in robe and full en-suite), light filled kitchen with 900mm stainless steel appliances, stone benchtops, open plan generous sized meals/living area, multiple living zones, central bathroom with separate shower/bath and stone benchtop, ample storage space, ducted heating, alarm system, double garage with internal access and low maintenance front and rear yards.<br/><br/>This home is sure to impress, inspections will not disappoint!<br/><br/>What's more to love?<br/>- Low maintenance<br/>- 900mm stainless steel appliances<br/>- Evaporative cooling<br/>- Central heating<br/>- Multiple living zones<br/><br/>POTENTIAL RENTAL INCOME: $550 A WEEK",
    "propertyLink": "https://www.realestate.com.au/property-house-vic-wollert-143029712",
    "address": {
      "display": {
        "shortAddress": "12 Geary Avenue",
        "fullAddress": "12 Geary Avenue, Wollert, Vic 3750",
        "__typename": "AddressDisplay"
      },
      "suburb": "Wollert",
      "state": "Vic",
      "postcode": "3750",
      "__typename": "Address"
    },
    "propertySizes": {
      "building": {
        "displayValue": "195.1",
        "sizeUnit": {
          "displayValue": "m²",
          "__typename": "PropertySizeUnit"
        },
        "__typename": "PropertySize"
      },
      "land": {
        "displayValue": "331",
        "sizeUnit": {
          "displayValue": "m²",
          "__typename": "PropertySizeUnit"
        },
        "__typename": "PropertySize"
      },
      "preferred": {
        "sizeType": "LAND",
        "size": {
          "displayValue": "331",
          "sizeUnit": {
            "displayValue": "m²",
            "__typename": "PropertySizeUnit"
          },
          "__typename": "PropertySize"
        },
        "__typename": "PreferredPropertySize"
      },
      "__typename": "PropertySizes"
    },
    "generalFeatures": {
      "bedrooms": {
        "value": 4,
        "__typename": "IntValue"
      },
      "bathrooms": {
        "value": 2,
        "__typename": "IntValue"
      },
      "parkingSpaces": {
        "value": 2,
        "__typename": "IntValue"
      },
      "studies": {
        "value": 0,
        "__typename": "IntValue"
      },
      "__typename": "GeneralFeatures"
    },
    "propertyFeatures": null,
    "images": [
      "https://i2.au.reastatic.net/{size}/a69720736c21a81214fb1ae5f2469bf22cd3cd90967f650013536bcb5cc00094/image.jpg",
      "https://i2.au.reastatic.net/{size}/ffa1c7249947822b15a3c59a7b939792310922152aeebed7b8166fc6e1dca217/image.jpg",
      "https://i2.au.reastatic.net/{size}/9f4256aecccc71331d7b8aab9a2bca15760c4e054e76290e2cf26850f260a2d3/image.jpg",
      "https://i2.au.reastatic.net/{size}/fa5c52de77979f2d972b4382f45d21b89231b50c7687820941452ce8928bb69b/image.jpg",
      "https://i2.au.reastatic.net/{size}/cebccbfd72ca5cb0c24161540b298cf6985532b87cc89210e41b6301eb008b77/image.jpg",
      "https://i2.au.reastatic.net/{size}/0bbc9779f0ce181bf8138cddeec69e9e25639ac45eabfdd4c60a99f795c07065/image.jpg",
      "https://i2.au.reastatic.net/{size}/4e18b9cd82baf5b68855edd9a247d6ba032f0099a79ed17924ae3fe11ab0db32/image.jpg",
      "https://i2.au.reastatic.net/{size}/862f6671e3fb644655f0385b0b8b55bd8fd17458def73afbbda4648e1cd89072/image.jpg",
      "https://i2.au.reastatic.net/{size}/af79d30f3a6a4387c71be878db32a7383b62d8bf0ab8da92a4567658756352cd/image.jpg",
      "https://i2.au.reastatic.net/{size}/e7da34de1128125377c71883fedd6288ef1c65543723e16049b6c327a5e2a324/image.jpg",
      "https://i2.au.reastatic.net/{size}/269a976c3c0a2a0273e1b47139c3861a3653e957b83aea3262bcd5f2a7541313/image.jpg",
      "https://i2.au.reastatic.net/{size}/4f9f311dc06ffee98c7c5da82e07d26e1f80e5a17819e5bd44e53c888c01224e/image.jpg"
    ],
    "videos": null,
    "floorplans": null,
    "listingCompany": {
      "name": "Carvera Property",
      "id": "ORNIKX",
      "companyLink": "https://www.realestate.com.au/agency/carvera-property-ORNIKX?cid={cid}",
      "phoneNumber": "0466229631",
      "address": "G01/6-8 Montrose St, HAWTHORN EAST, VIC 3123",
      "ratingsReviews": {
        "avgRating": 5,
        "totalReviews": 23,
        "__typename": "AgencyRatingsReviews"
      },
      "description": null
    },
    "listers": [
      {
        "id": "3084543",
        "name": "Chad Gamage",
        "photo": {
          "templatedUrl": "https://i2.au.reastatic.net/{size}/f8a10fa6c4ce2df0d8901c087ece63b07a32fc21362d73c6702e9fc65090d780/main.jpg",
          "__typename": "Image"
        },
        "phoneNumber": {
          "display": "0424876263",
          "showDisclaimer": false,
          "__typename": "PhoneNumber"
        },
        "_links": {
          "canonical": {
            "href": "https://www.realestate.com.au/agent/chad-gamage-3084543?cid={cid}",
            "__typename": "AbsoluteLinks"
          },
          "__typename": "ListerLinks"
        },
        "__typename": "Lister",
        "agentId": null,
        "jobTitle": "Sales Manager",
        "showInMediaViewer": false
      },
      {
        "id": "3243944",
        "name": "Stalon Ablahad",
        "photo": {
          "templatedUrl": "https://i2.au.reastatic.net/{size}/e8b77f7268a0aa114c0f3d0caed4392e4b06d13978a11644527bcf4a2cf39da5/main.jpg",
          "__typename": "Image"
        },
        "phoneNumber": {
          "display": "0466659650",
          "showDisclaimer": false,
          "__typename": "PhoneNumber"
        },
        "_links": {
          "canonical": {
            "href": "https://www.realestate.com.au/agent/stalon-ablahad-3243944?cid={cid}",
            "__typename": "AbsoluteLinks"
          },
          "__typename": "ListerLinks"
        },
        "__typename": "Lister",
        "agentId": null,
        "jobTitle": "Sales Executive",
        "showInMediaViewer": false
      }
    ],
    "auction": null
  }
]

We can successfully scrape real estate listing data from realestate.com.au search and property pages. However, our scraper will likely get blocked after sending a few additional requests. Let's take a look at a solution!

How to Bypass Realestate.com.au Scraping Blocking

To bypass web scraping blocking, we need to pay attention to several details, including IP address, TLS handshakes, headers and cookies. This is where Scrapfly can lend you a hand!

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Anti-bot protection bypass - scrape web pages without blocking!
Rotating residential proxies - prevent IP address and geographic blocks.
JavaScript rendering - scrape dynamic web pages through cloud browsers.
Full browser automation - control browsers to scroll, input and click on objects.
Format conversion - scrape as HTML, JSON, Text, or Markdown.
Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.

For example, here is how we can use the ScrapFly asp feature to scrape realestate.com.au without getting blocked:

import httpx
from parsel import Selector

response = httpx.get("some realestate.com.au url")
selector = Selector(response.text)

# in ScrapFly SDK becomes
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient("Your ScrapFly API key")

result = client.scrape(ScrapeConfig(
    "some realestate.com.au url",
    # set the proxy location to australia
    country="AU",
    # enable the anti scraping protection bypass
    asp=True
))
selector = result.selector

Sign-up for FREE to get you API key!

FAQ

To wrap up this guide, let's take a look at some frequently asked questions.

Is it legal to scrape realestate.com.au?

Scraping publicly available real estate data is legal however it should be confirmed with the Terms of Service agreement if it applies to you and your use case. For more see our web scraping legality page.

Is there a public API for realestate.com.au?

At the time of writing, there is no public API available for realestate.com.au. However, scraping realestate.com.au is straightforward and you can use it to create your own web scraping API.

Are there alternatives for realestate.com.au?

Yes, there are alternative websites for real estate ads in Australia. Check out our tag #realestate for more options.

Latest Realestate.com.au Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

How to Scrape Realestate.com.au - Summary

Realestate.com.au is a popular website for real estate ads in Australia, which can detect and block web scrapers.

In this article, we explained how to avoid realestate.com.au web scraping blocking. We also went through a step-by-step guide on creating a realestate.com.au scraper for property and search pages using Python. Which works by extracting the property listing data directly in JSON from the HTML.

How to Scrape Realestate.com.au Property Listing Data

Latest Realestate.com.au Scraper Code

Why Scrape Realestate.com.au?

Project Setup

How to Scrape Realestate.com.au Propety Pages

How to Scrape Hidden Web Data

How to Scrape Realestate.com.au Search Pages

How to Bypass Realestate.com.au Scraping Blocking

FAQ

Is it legal to scrape realestate.com.au?

Is there a public API for realestate.com.au?

Are there alternatives for realestate.com.au?

How to Scrape Realestate.com.au - Summary

Explore this Article with AI

Related Knowledgebase

Python httpx vs requests vs aiohttp - key differences

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

What Python libraries support HTTP2?

How to handle popup dialogs in Playwright?

How to use proxies with Python httpx?

How to scrape images from a website?

How to select dictionary key recursively in Python?

How to use cURL in Python?

How to fix Python requests SSLError?

Selenium: geckodriver executable needs to be in PATH?

Selenium: chromedriver executable needs to be in PATH?

How to fix Python requests ReadTimeout error?

Related Articles

How to Scrape Domain.com.au Real Estate Property Data

How to Scrape Immowelt.de Real Estate Data

How to Scrape Homegate.ch Real Estate Property Data

How to Scrape Immoscout24.ch Real Estate Property Data

How to Scrape RightMove Real Estate Property Data