🚀 We are hiring! See open positions

How to Scrape Realestate.com.au Property Listing Data

How to Scrape Realestate.com.au Property Listing Data

When it comes it comes to real estate websites in Australia, there are a few options and Realestate.com.au is biggest one. It's a popular website for real estate ads featuring thousands of different property listings across the country. However, it's a highly protected website, making it challenging to scrape.

In this article, we'll explain how to scrape realestate.com.au for real estate data from property and search pages. We'll also explain how to avoid realestate.com.au web scraping blocking. Let's dive in!

Quick Start

Need a working scraper right now? Clone the maintained Realestate.com.au project with ScrapFly-ready settings:

git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd scrapfly-scrapers/realestatecom-scraper

The repository contains async clients, pagination helpers, and ScrapFly configuration so you can run a production crawl with minimal edits.

Latest Realestate.com.au Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

What Is Realestate.com.au?

Realestate.com.au aggregates residential and commercial listings across Australia. Each listing page exposes structured data for pricing, address metadata, geocodes, land size, agency details, photos, floor plans, and lister contact information.

All of that data lives in the window.ArgonautExchange JSON cache, which means we can skip brittle DOM selectors and work directly with hidden data.

Why Scrape Realestate.com.au?

  • Market intelligence: Monitor inventory, price trends, and days on market for specific suburbs or postcodes.
  • Agency benchmarking: Track activity per agency or lister to understand competition.
  • Lead generation: Capture structured contact details for outreach or CRM enrichment.
  • Proptech products: Feed automated valuation models, alert systems, or portfolio dashboards.
  • Historical archives: Build private comps by storing daily snapshots of active listings.

For more inspiration see our real estate scraping use case hub.

Challenges of Scraping Realestate.com.au

Realestate.com.au borrows many tactics from modern ecommerce defenses. Expect the following hurdles.

Anti Bot Defenses

  • TLS fingerprint inspection: Clients that do not mimic browser grade TLS handshakes are throttled.
  • Header and cookie checks: Reusing the same headers or cookie jars triggers challenges.
  • Geo filtering: Traffic outside Australia sees extra blocks and captchas.
  • Hidden script validation: The site verifies how you access ArgonautExchange to catch naive parsers.

Rate Limiting and IP Hygiene

  • Tight quotas: Even clean Australian IPs hit rate limits if you send bursts of requests.
  • Sequence detection: Unnatural pagination patterns or instant property fetches look robotic.
  • Session freshness: Long lived sessions are challenged, so refresh tokens often.

Deep Hidden Data

  • Nested JSON layers: Valuable fields are stringified multiple times.
  • ID heavy structure: Media, listers, and features are keyed by IDs that need joining logic.
  • Variant rich listings: Each listing has arrays for media, property features, listers, and more.

We will tackle each issue the same way we approached Nordstrom: hidden data parsing plus ScrapFly for unblocking.

For more details, refer to our previous article on real estate web scraping use cases.

Realestate.com.au Scrape Preview

We’ll scrape two key datasets from realestate.com.au: detailed single property listings, and bulk summary data from search results. This gives us both granular and broad views for analysis, and covers all main scraping techniques needed.

Sample property dataset
[
  {
    "id": "143160680",
    "propertyType": "House",
    "description": "Renowned Real Estate proudly presents this sensational opportunity...",
    "propertyLink": "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
    "address": {
      "suburb": "Tarneit",
      "state": "Vic",
      "postcode": "3029",
      "display": {
        "shortAddress": "28 Chantelle Parade",
        "fullAddress": "28 Chantelle Parade, Tarneit, Vic 3029"
      }
    },
    "propertySizes": {
      "land": {
        "displayValue": "336",
        "sizeUnit": {
          "displayValue": "m²"
        }
      }
    },
    "generalFeatures": {
      "bedrooms": {
        "value": 4
      },
      "bathrooms": {
        "value": 2
      },
      "parkingSpaces": {
        "value": 2
      }
    },
    "propertyFeatures": [
      {
        "featureName": "Built-in wardrobes",
        "value": null
      }
    ],
    "images": [
      "https://i2.au.reastatic.net/{size}/d8d3607342301e4e1b5b4cb84e3fc3d8cf48849a6311dd38e44bf3977fc593d8/image.jpg"
    ],
    "listingCompany": {
      "name": "Renowned Real Estate - CRAIGIEBURN",
      "phoneNumber": "0452060566"
    },
    "listers": [
      {
        "name": "Him Raj Parajuli",
        "phoneNumber": {
          "display": "0452060566"
        }
      }
    ]
  }
]
Sample search dataset
[
  {
    "id": "143029712",
    "propertyType": "House",
    "description": "Set in the sought-after Aurora Estate...",
    "propertyLink": "https://www.realestate.com.au/property-house-vic-wollert-143029712",
    "address": {
      "display": {
        "shortAddress": "12 Geary Avenue",
        "fullAddress": "12 Geary Avenue, Wollert, Vic 3750"
      },
      "suburb": "Wollert",
      "state": "Vic",
      "postcode": "3750"
    },
    "propertySizes": {
      "building": {
        "displayValue": "195.1"
      },
      "land": {
        "displayValue": "331"
      }
    },
    "generalFeatures": {
      "bedrooms": {
        "value": 4
      },
      "bathrooms": {
        "value": 2
      }
    },
    "listingCompany": {
      "name": "Carvera Property",
      "phoneNumber": "0466229631"
    }
  }
]

Scraping Realestate.com.au with Python

We will follow the same hidden data flow we used for Nordstrom: fetch HTML with Httpx, grab the script via Parsel, reshape it with JMESPath, then show the ScrapFly variant.

Project Setup

To scrape realestate.com.au, we'll use a few Python packages:

  • httpx - async HTTP client with HTTP/2.
  • parsel - DOM parser for XPath or CSS queries.
  • JMESPath - JSON query engine used to reshape data.
  • Sasyncio - Python standard library for concurrency.
  • ScrapFly SDK - optional managed client with ASP.

Install everything asyncio ships with Python:

pip install httpx parsel jmespath scrapfly-sdk

When creating httpx.AsyncClient, enable http2=True and feed browser grade headers for User-Agent, Accept, and Accept-Language. ScrapFly handles this automatically once asp=True.

Scrape Realestate.com.au Property Pages

Pick any property such as this townhouse example. Open the page source, search for window.ArgonautExchange, and note the JSON blob. We will automate those steps.

Property listing page on realestate.com.au
Property listing page on realestate.com.au

How to Scrape Hidden Web Data

The visible HTML doesn't always represent the whole dataset available on the page. In this article, we'll be taking a look at scraping of hidden web data. What is it and how can we scrape it using Python?

How to Scrape Hidden Web Data
Python
ScrapFly
import re
import json
import asyncio
import jmespath
from httpx import AsyncClient, Response
from parsel import Selector
from typing import List, Dict

client = AsyncClient(
    http2=True,
    headers={
        "accept-language": "en-AU,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "accept-encoding": "gzip, deflate, br",
    }
)


def parse_property_data(data: Dict) -> Dict:
    """reshape property payload with JMESPath"""
    if not data:
        return
    return jmespath.search(
        """{
        id: id,
        propertyType: propertyType.display,
        description: description,
        propertyLink: _links.canonical.href,
        address: address,
        propertySizes: propertySizes,
        generalFeatures: generalFeatures,
        propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
        images: media.images[].templatedUrl,
        videos: videos,
        floorplans: floorplans,
        listingCompany: listingCompany.{name: name, id: id, companyLink: _links.canonical.href, phoneNumber: businessPhone, address: address.display.fullAddress, ratingsReviews: ratingsReviews, description: description},
        listers: listers,
        auction: auction
        }""",
        data,
    )


def parse_hidden_data(response: Response) -> Dict:
    """extract window.ArgonautExchange payload"""
    selector = Selector(response.text)
    script = selector.xpath("//script[contains(text(),'window.ArgonautExchange')]/text()").get()
    data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
    data = json.loads(data["resi-property_listing-experience-web"]["urqlClientCache"])
    data = json.loads(list(data.values())[0]["data"])
    return data


async def scrape_properties(urls: List[str]) -> List[Dict]:
    """scrape listing data from property pages"""
    to_scrape = [client.get(url) for url in urls]
    properties = []
    for response in asyncio.as_completed(to_scrape):
        response = await response
        assert response.status_code == 200, "request has been blocked"
        data = parse_hidden_data(response)["details"]["listing"]
        data = parse_property_data(data)
        properties.append(data)
    print(f"scraped {len(properties)} property listings")
    return properties
import re
import json
import jmespath
from typing import Dict, List
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")


def parse_property_data(data: Dict) -> Dict:
    """reshape property payload with JMESPath"""
    if not data:
        return
    return jmespath.search(
        """{
        id: id,
        propertyType: propertyType.display,
        description: description,
        propertyLink: _links.canonical.href,
        address: address,
        propertySizes: propertySizes,
        generalFeatures: generalFeatures,
        propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
        images: media.images[].templatedUrl,
        videos: videos,
        floorplans: floorplans,
        listingCompany: listingCompany.{name: name, id: id, companyLink: _links.canonical.href, phoneNumber: businessPhone, address: address.display.fullAddress, ratingsReviews: ratingsReviews, description: description},
        listers: listers,
        auction: auction
        }""",
        data,
    )


def parse_hidden_data(response: ScrapeApiResponse) -> Dict:
    """extract window.ArgonautExchange payload"""
    script = response.selector.xpath("//script[contains(text(),'window.ArgonautExchange')]/text()").get()
    data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
    data = json.loads(data["resi-property_listing-experience-web"]["urqlClientCache"])
    data = json.loads(list(data.values())[0]["data"])
    return data


async def scrape_properties(urls: List[str]) -> List[Dict]:
    """scrape listing data using ScrapFly"""
    to_scrape = [ScrapeConfig(url, country="AU", asp=True) for url in urls]
    properties = []
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        data = parse_hidden_data(response)["details"]["listing"]
        data = parse_property_data(data)
        properties.append(data)
    print(f"scraped {len(properties)} property listings")
    return properties
Run the code
async def run():
    data = await scrape_properties(
        urls = [
            "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
            "https://www.realestate.com.au/property-house-vic-bundoora-141557712",
            "https://www.realestate.com.au/property-townhouse-vic-glenroy-143556608",
        ]
    )
    print(json.dumps(data, indent=2))

if __name__ == "__main__":
    asyncio.run(run())

🙋‍ If you see blocks while running the Python tab, switch to the ScrapFly version to inherit ASP, geo routing, and automatic retries.

The helper trio does exactly what we need:

  • parse_hidden_data() extracts the script and repeatedly parses the nested JSON.
  • parse_property_data() uses JMESPath to keep only the fields we need.
  • scrape_properties() queues multiple URLs and awaits them concurrently.
Sample property output
[
  {
    "id": "143160680",
    "propertyType": "House",
    "description": "Renowned Real Estate proudly presents this sensational opportunity with a luxury house in Tarneit.<br/><br/>This beautiful low maintenance home is situated in the well-established suburb of Tarneit...",
    "propertyLink": "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
    "address": {
      "suburb": "Tarneit",
      "state": "Vic",
      "postcode": "3029",
      "display": {
        "shortAddress": "28 Chantelle Parade",
        "fullAddress": "28 Chantelle Parade, Tarneit, Vic 3029",
        "geocode": {
          "latitude": -37.85273078,
          "longitude": 144.66332821
        }
      }
    },
    "propertySizes": {
      "land": {
        "displayValue": "336",
        "sizeUnit": {
          "displayValue": "m²"
        }
      }
    },
    "generalFeatures": {
      "bedrooms": {
        "value": 4
      },
      "bathrooms": {
        "value": 2
      },
      "parkingSpaces": {
        "value": 2
      }
    },
    "images": [
      "https://i2.au.reastatic.net/{size}/d8d3607342301e4e1b5b4cb84e3fc3d8cf48849a6311dd38e44bf3977fc593d8/image.jpg"
    ],
    "listingCompany": {
      "name": "Renowned Real Estate - CRAIGIEBURN",
      "phoneNumber": "0452060566"
    },
    "listers": [
      {
        "name": "Him Raj Parajuli",
        "phoneNumber": {
          "display": "0452060566"
        }
      }
    ]
  }
]

How to Scrape Realestate.com.au Search Pages

Search results expose the same window.ArgonautExchange payload. Inspect the HTML and capture the JSON.

search pages hidden web data
Search pages hidden web data

Pagination uses the /list-{page} suffix. For example, /list-1 is page one, /list-2 is page two, and so on.

Python
ScrapFly
import re
import json
import asyncio
import jmespath
from httpx import AsyncClient, Response
from parsel import Selector
from typing import List, Dict

client = AsyncClient(
    http2=True,
    headers={
        "accept-language": "en-AU,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "accept-encoding": "gzip, deflate, br",
    }
)


def parse_property_data(data: Dict) -> Dict:
    """reuse property parser"""
    return jmespath.search(
        """{
        id: id,
        propertyType: propertyType.display,
        description: description,
        propertyLink: _links.canonical.href,
        address: address,
        propertySizes: propertySizes,
        generalFeatures: generalFeatures,
        propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
        images: media.images[].templatedUrl,
        listingCompany: listingCompany.{name: name, phoneNumber: businessPhone},
        listers: listers,
        auction: auction
        }""",
        data,
    )


def parse_hidden_data(response: Response) -> Dict:
    """extract window.ArgonautExchange payload"""
    selector = Selector(response.text)
    script = selector.xpath("//script[contains(text(),'window.ArgonautExchange')]/text()").get()
    data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
    data = json.loads(data["resi-property_search-experience-web"]["urqlClientCache"])
    data = json.loads(list(data.values())[0]["data"])
    return data


def parse_search_data(data: Dict) -> Dict:
    """reshape search payload"""
    search_data = []
    data = list(data.values())[0]
    for listing in data["results"]["exact"]["items"]:
        search_data.append(parse_property_data(listing["listing"]))
    max_search_pages = data["results"]["pagination"]["maxPageNumberAvailable"]
    return {"search_data": search_data, "max_search_pages": max_search_pages}


async def scrape_search(url: str, max_scrape_pages: int | None = None):
    """scrape property listings from search pages"""
    first_page = await client.get(url)
    assert first_page.status_code == 200, "request has been blocked"
    print(f"scraping search page {url}")
    data = parse_search_data(parse_hidden_data(first_page))
    search_data = data["search_data"]
    max_search_pages = data["max_search_pages"]
    if max_scrape_pages and max_scrape_pages < max_search_pages:
        max_search_pages = max_scrape_pages
    print(f"scraping search pagination, remaining ({max_search_pages - 1} more pages)")
    other_pages = [
        client.get(str(first_page.url).split("/list")[0] + f"/list-{page}")
        for page in range(2, max_search_pages + 1)
    ]
    for response in asyncio.as_completed(other_pages):
        response = await response
        assert response.status_code == 200, "request has been blocked"
        data = parse_search_data(parse_hidden_data(response))
        search_data.extend(data["search_data"])
    print(f"scraped ({len(search_data)}) from {url}")
    return search_data
import re
import json
import jmespath
from typing import Dict, List
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")


def parse_property_data(data: Dict) -> Dict:
    """reuse property parser"""
    return jmespath.search(
        """{
        id: id,
        propertyType: propertyType.display,
        description: description,
        propertyLink: _links.canonical.href,
        address: address,
        propertySizes: propertySizes,
        generalFeatures: generalFeatures,
        propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
        images: media.images[].templatedUrl,
        listingCompany: listingCompany.{name: name, phoneNumber: businessPhone},
        listers: listers,
        auction: auction
        }""",
        data,
    )


def parse_hidden_data(response: ScrapeApiResponse) -> Dict:
    """extract window.ArgonautExchange payload"""
    script = response.selector.xpath("//script[contains(text(),'window.ArgonautExchange')]/text()").get()
    data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
    data = json.loads(data["resi-property_search-experience-web"]["urqlClientCache"])
    data = json.loads(list(data.values())[0]["data"])
    return data


def parse_search_data(data: Dict) -> Dict:
    """reshape search payload"""
    search_data = []
    data = list(data.values())[0]
    for listing in data["results"]["exact"]["items"]:
        search_data.append(parse_property_data(listing["listing"]))
    max_search_pages = data["results"]["pagination"]["maxPageNumberAvailable"]
    return {"search_data": search_data, "max_search_pages": max_search_pages}


async def scrape_search(url: str, max_scrape_pages: int | None = None):
    """scrape search pages with ScrapFly"""
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(url, country="AU", asp=True))
    print(f"scraping search page {url}")
    data = parse_search_data(parse_hidden_data(first_page))
    search_data = data["search_data"]
    max_search_pages = data["max_search_pages"]
    if max_scrape_pages and max_scrape_pages < max_search_pages:
        max_search_pages = max_scrape_pages
    print(f"scraping search pagination, remaining ({max_search_pages - 1} more pages)")
    other_pages = [
        ScrapeConfig(
            str(first_page.context["url"]).split("/list")[0] + f"/list-{page}",
            country="AU",
            asp=True,
        )
        for page in range(2, max_search_pages + 1)
    ]
    async for response in SCRAPFLY.concurrent_scrape(other_pages):
        data = parse_search_data(parse_hidden_data(response))
        search_data.extend(data["search_data"])
    print(f"scraped ({len(search_data)}) from {url}")
    return search_data
Run the code
async def run():
    data = await scrape_search(
        url="https://www.realestate.com.au/buy/in-melbourne+-+northern+region,+vic/list-1",
        max_scrape_pages=3
    )
    print(json.dumps(data, indent=2))

if __name__ == "__main__":
    asyncio.run(run())
Sample search output
[
  {
    "id": "143029712",
    "propertyType": "House",
    "description": "Set in the sought-after Aurora Estate and in a prime location close to all amenities including the newly opened Aurora Village and Edgars Creek Secondary School...",
    "propertyLink": "https://www.realestate.com.au/property-house-vic-wollert-143029712",
    "address": {
      "display": {
        "shortAddress": "12 Geary Avenue",
        "fullAddress": "12 Geary Avenue, Wollert, Vic 3750"
      },
      "suburb": "Wollert",
      "state": "Vic",
      "postcode": "3750"
    },
    "propertySizes": {
      "building": {
        "displayValue": "195.1"
      },
      "land": {
        "displayValue": "331"
      }
    },
    "generalFeatures": {
      "bedrooms": {
        "value": 4
      },
      "bathrooms": {
        "value": 2
      }
    },
    "listingCompany": {
      "name": "Carvera Property",
      "phoneNumber": "0466229631"
    },
    "listers": [
      {
        "name": "Chad Gamage",
        "phoneNumber": {
          "display": "0424876263"
        }
      }
    ]
  }
]

How to Bypass Realestate.com.au Scraping Blocking

Blocking usually happens when TLS fingerprints look automated, requests come from outside Australia, or you send bursts without delays. ScrapFly hides those signals so you can focus on parsing data.

scrapfly middleware
ScrapFly API workflow

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Here is how to enable ScrapFly Anti Scraping Protection (ASP) and keep traffic inside Australia:

import httpx
from parsel import Selector

response = httpx.get("https://www.realestate.com.au/property-house-vic-tarneit-143160680")
selector = Selector(response.text)

# ScrapFly version
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient("YOUR SCRAPFLY API KEY")

result = client.scrape(ScrapeConfig(
    "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
    country="AU",
    asp=True,
    cache=True,
    debug=True,
))
selector = result.selector
Sign-up for FREE to get you API key!

FAQs

Now let's take a look at some frequently asked questions about realstate.com.au scraping.

How do I extract data from realestate.com.au's hidden JSON data?

Look for window.ArgonautExchange script tags in the page source. Parse the JSON data using json.loads() and navigate through the nested structure to access property details, search results, and pagination information.

What's the best way to handle realestate.com.au's anti-bot protection?

Use Australian residential proxies, implement realistic request delays, rotate user-agents, use headless browsers for JavaScript rendering, and consider anti-bot bypass services like ScrapFly to avoid detection.

How do I scrape multiple search pages from realestate.com.au?

Use the pagination parameter /list-{page_number} in URLs. Parse the maxPageNumberAvailable from the first page's JSON data to determine total pages, then scrape remaining pages concurrently.

Can I scrape historical property data from realestate.com.au?

Realestate.com.au primarily shows current listings. For historical data, you'd need to continuously scrape and store data over time, or look for property history APIs if available.

How do I handle rate limiting when scraping realestate.com.au at scale?

Implement delays between requests (2-5 seconds), use rotating proxies, distribute requests across multiple IP addresses, and consider using a scraping service that handles rate limiting automatically.

Are there alternatives for realestate.com.au?

Yes, there are alternative websites for real estate ads in Australia. Check out our tag #realestate for more options.

Summary

Realestate.com.au is a popular website for real estate ads in Australia, which can detect and block web scrapers.

In this article, we explained how to avoid realestate.com.au web scraping blocking. We also went through a step-by-step guide on creating a realestate.com.au scraper for property and search pages using Python. Which works by extracting the property listing data directly in JSON from the HTML.

Explore this Article with AI

Related Knowledgebase

Related Articles