When it comes it comes to real estate websites in Australia, there are a few options and Realestate.com.au is biggest one. It's a popular website for real estate ads featuring thousands of different property listings across the country. However, it's a highly protected website, making it challenging to scrape.
In this article, we'll explain how to scrape realestate.com.au for real estate data from property and search pages. We'll also explain how to avoid realestate.com.au web scraping blocking. Let's dive in!
Quick Start
Need a working scraper right now? Clone the maintained Realestate.com.au project with ScrapFly-ready settings:
git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd scrapfly-scrapers/realestatecom-scraper
The repository contains async clients, pagination helpers, and ScrapFly configuration so you can run a production crawl with minimal edits.
Latest Realestate.com.au Scraper Code
What Is Realestate.com.au?
Realestate.com.au aggregates residential and commercial listings across Australia. Each listing page exposes structured data for pricing, address metadata, geocodes, land size, agency details, photos, floor plans, and lister contact information.
All of that data lives in the window.ArgonautExchange JSON cache, which means we can skip brittle DOM selectors and work directly with hidden data.
Why Scrape Realestate.com.au?
- Market intelligence: Monitor inventory, price trends, and days on market for specific suburbs or postcodes.
- Agency benchmarking: Track activity per agency or lister to understand competition.
- Lead generation: Capture structured contact details for outreach or CRM enrichment.
- Proptech products: Feed automated valuation models, alert systems, or portfolio dashboards.
- Historical archives: Build private comps by storing daily snapshots of active listings.
For more inspiration see our real estate scraping use case hub.
Challenges of Scraping Realestate.com.au
Realestate.com.au borrows many tactics from modern ecommerce defenses. Expect the following hurdles.
Anti Bot Defenses
- TLS fingerprint inspection: Clients that do not mimic browser grade TLS handshakes are throttled.
- Header and cookie checks: Reusing the same headers or cookie jars triggers challenges.
- Geo filtering: Traffic outside Australia sees extra blocks and captchas.
- Hidden script validation: The site verifies how you access
ArgonautExchangeto catch naive parsers.
Rate Limiting and IP Hygiene
- Tight quotas: Even clean Australian IPs hit rate limits if you send bursts of requests.
- Sequence detection: Unnatural pagination patterns or instant property fetches look robotic.
- Session freshness: Long lived sessions are challenged, so refresh tokens often.
Deep Hidden Data
- Nested JSON layers: Valuable fields are stringified multiple times.
- ID heavy structure: Media, listers, and features are keyed by IDs that need joining logic.
- Variant rich listings: Each listing has arrays for media, property features, listers, and more.
We will tackle each issue the same way we approached Nordstrom: hidden data parsing plus ScrapFly for unblocking.
For more details, refer to our previous article on real estate web scraping use cases.
Realestate.com.au Scrape Preview
We’ll scrape two key datasets from realestate.com.au: detailed single property listings, and bulk summary data from search results. This gives us both granular and broad views for analysis, and covers all main scraping techniques needed.
Sample property dataset
[
{
"id": "143160680",
"propertyType": "House",
"description": "Renowned Real Estate proudly presents this sensational opportunity...",
"propertyLink": "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
"address": {
"suburb": "Tarneit",
"state": "Vic",
"postcode": "3029",
"display": {
"shortAddress": "28 Chantelle Parade",
"fullAddress": "28 Chantelle Parade, Tarneit, Vic 3029"
}
},
"propertySizes": {
"land": {
"displayValue": "336",
"sizeUnit": {
"displayValue": "m²"
}
}
},
"generalFeatures": {
"bedrooms": {
"value": 4
},
"bathrooms": {
"value": 2
},
"parkingSpaces": {
"value": 2
}
},
"propertyFeatures": [
{
"featureName": "Built-in wardrobes",
"value": null
}
],
"images": [
"https://i2.au.reastatic.net/{size}/d8d3607342301e4e1b5b4cb84e3fc3d8cf48849a6311dd38e44bf3977fc593d8/image.jpg"
],
"listingCompany": {
"name": "Renowned Real Estate - CRAIGIEBURN",
"phoneNumber": "0452060566"
},
"listers": [
{
"name": "Him Raj Parajuli",
"phoneNumber": {
"display": "0452060566"
}
}
]
}
]
Sample search dataset
[
{
"id": "143029712",
"propertyType": "House",
"description": "Set in the sought-after Aurora Estate...",
"propertyLink": "https://www.realestate.com.au/property-house-vic-wollert-143029712",
"address": {
"display": {
"shortAddress": "12 Geary Avenue",
"fullAddress": "12 Geary Avenue, Wollert, Vic 3750"
},
"suburb": "Wollert",
"state": "Vic",
"postcode": "3750"
},
"propertySizes": {
"building": {
"displayValue": "195.1"
},
"land": {
"displayValue": "331"
}
},
"generalFeatures": {
"bedrooms": {
"value": 4
},
"bathrooms": {
"value": 2
}
},
"listingCompany": {
"name": "Carvera Property",
"phoneNumber": "0466229631"
}
}
]
Scraping Realestate.com.au with Python
We will follow the same hidden data flow we used for Nordstrom: fetch HTML with Httpx, grab the script via Parsel, reshape it with JMESPath, then show the ScrapFly variant.
Project Setup
To scrape realestate.com.au, we'll use a few Python packages:
- httpx - async HTTP client with HTTP/2.
- parsel - DOM parser for XPath or CSS queries.
- JMESPath - JSON query engine used to reshape data.
- Sasyncio - Python standard library for concurrency.
- ScrapFly SDK - optional managed client with ASP.
Install everything asyncio ships with Python:
pip install httpx parsel jmespath scrapfly-sdk
When creating httpx.AsyncClient, enable http2=True and feed browser grade headers for User-Agent, Accept, and Accept-Language. ScrapFly handles this automatically once asp=True.
Scrape Realestate.com.au Property Pages
Pick any property such as this townhouse example. Open the page source, search for window.ArgonautExchange, and note the JSON blob. We will automate those steps.
How to Scrape Hidden Web Data
The visible HTML doesn't always represent the whole dataset available on the page. In this article, we'll be taking a look at scraping of hidden web data. What is it and how can we scrape it using Python?
import re
import json
import asyncio
import jmespath
from httpx import AsyncClient, Response
from parsel import Selector
from typing import List, Dict
client = AsyncClient(
http2=True,
headers={
"accept-language": "en-AU,en;q=0.9",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"accept-encoding": "gzip, deflate, br",
}
)
def parse_property_data(data: Dict) -> Dict:
"""reshape property payload with JMESPath"""
if not data:
return
return jmespath.search(
"""{
id: id,
propertyType: propertyType.display,
description: description,
propertyLink: _links.canonical.href,
address: address,
propertySizes: propertySizes,
generalFeatures: generalFeatures,
propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
images: media.images[].templatedUrl,
videos: videos,
floorplans: floorplans,
listingCompany: listingCompany.{name: name, id: id, companyLink: _links.canonical.href, phoneNumber: businessPhone, address: address.display.fullAddress, ratingsReviews: ratingsReviews, description: description},
listers: listers,
auction: auction
}""",
data,
)
def parse_hidden_data(response: Response) -> Dict:
"""extract window.ArgonautExchange payload"""
selector = Selector(response.text)
script = selector.xpath("//script[contains(text(),'window.ArgonautExchange')]/text()").get()
data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
data = json.loads(data["resi-property_listing-experience-web"]["urqlClientCache"])
data = json.loads(list(data.values())[0]["data"])
return data
async def scrape_properties(urls: List[str]) -> List[Dict]:
"""scrape listing data from property pages"""
to_scrape = [client.get(url) for url in urls]
properties = []
for response in asyncio.as_completed(to_scrape):
response = await response
assert response.status_code == 200, "request has been blocked"
data = parse_hidden_data(response)["details"]["listing"]
data = parse_property_data(data)
properties.append(data)
print(f"scraped {len(properties)} property listings")
return properties
import re
import json
import jmespath
from typing import Dict, List
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")
def parse_property_data(data: Dict) -> Dict:
"""reshape property payload with JMESPath"""
if not data:
return
return jmespath.search(
"""{
id: id,
propertyType: propertyType.display,
description: description,
propertyLink: _links.canonical.href,
address: address,
propertySizes: propertySizes,
generalFeatures: generalFeatures,
propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
images: media.images[].templatedUrl,
videos: videos,
floorplans: floorplans,
listingCompany: listingCompany.{name: name, id: id, companyLink: _links.canonical.href, phoneNumber: businessPhone, address: address.display.fullAddress, ratingsReviews: ratingsReviews, description: description},
listers: listers,
auction: auction
}""",
data,
)
def parse_hidden_data(response: ScrapeApiResponse) -> Dict:
"""extract window.ArgonautExchange payload"""
script = response.selector.xpath("//script[contains(text(),'window.ArgonautExchange')]/text()").get()
data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
data = json.loads(data["resi-property_listing-experience-web"]["urqlClientCache"])
data = json.loads(list(data.values())[0]["data"])
return data
async def scrape_properties(urls: List[str]) -> List[Dict]:
"""scrape listing data using ScrapFly"""
to_scrape = [ScrapeConfig(url, country="AU", asp=True) for url in urls]
properties = []
async for response in SCRAPFLY.concurrent_scrape(to_scrape):
data = parse_hidden_data(response)["details"]["listing"]
data = parse_property_data(data)
properties.append(data)
print(f"scraped {len(properties)} property listings")
return properties
Run the code
async def run():
data = await scrape_properties(
urls = [
"https://www.realestate.com.au/property-house-vic-tarneit-143160680",
"https://www.realestate.com.au/property-house-vic-bundoora-141557712",
"https://www.realestate.com.au/property-townhouse-vic-glenroy-143556608",
]
)
print(json.dumps(data, indent=2))
if __name__ == "__main__":
asyncio.run(run())
🙋 If you see blocks while running the Python tab, switch to the ScrapFly version to inherit ASP, geo routing, and automatic retries.
The helper trio does exactly what we need:
parse_hidden_data()extracts the script and repeatedly parses the nested JSON.parse_property_data()uses JMESPath to keep only the fields we need.scrape_properties()queues multiple URLs and awaits them concurrently.
Sample property output
[
{
"id": "143160680",
"propertyType": "House",
"description": "Renowned Real Estate proudly presents this sensational opportunity with a luxury house in Tarneit.<br/><br/>This beautiful low maintenance home is situated in the well-established suburb of Tarneit...",
"propertyLink": "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
"address": {
"suburb": "Tarneit",
"state": "Vic",
"postcode": "3029",
"display": {
"shortAddress": "28 Chantelle Parade",
"fullAddress": "28 Chantelle Parade, Tarneit, Vic 3029",
"geocode": {
"latitude": -37.85273078,
"longitude": 144.66332821
}
}
},
"propertySizes": {
"land": {
"displayValue": "336",
"sizeUnit": {
"displayValue": "m²"
}
}
},
"generalFeatures": {
"bedrooms": {
"value": 4
},
"bathrooms": {
"value": 2
},
"parkingSpaces": {
"value": 2
}
},
"images": [
"https://i2.au.reastatic.net/{size}/d8d3607342301e4e1b5b4cb84e3fc3d8cf48849a6311dd38e44bf3977fc593d8/image.jpg"
],
"listingCompany": {
"name": "Renowned Real Estate - CRAIGIEBURN",
"phoneNumber": "0452060566"
},
"listers": [
{
"name": "Him Raj Parajuli",
"phoneNumber": {
"display": "0452060566"
}
}
]
}
]
How to Scrape Realestate.com.au Search Pages
Search results expose the same window.ArgonautExchange payload. Inspect the HTML and capture the JSON.
Pagination uses the /list-{page} suffix. For example, /list-1 is page one, /list-2 is page two, and so on.
import re
import json
import asyncio
import jmespath
from httpx import AsyncClient, Response
from parsel import Selector
from typing import List, Dict
client = AsyncClient(
http2=True,
headers={
"accept-language": "en-AU,en;q=0.9",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"accept-encoding": "gzip, deflate, br",
}
)
def parse_property_data(data: Dict) -> Dict:
"""reuse property parser"""
return jmespath.search(
"""{
id: id,
propertyType: propertyType.display,
description: description,
propertyLink: _links.canonical.href,
address: address,
propertySizes: propertySizes,
generalFeatures: generalFeatures,
propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
images: media.images[].templatedUrl,
listingCompany: listingCompany.{name: name, phoneNumber: businessPhone},
listers: listers,
auction: auction
}""",
data,
)
def parse_hidden_data(response: Response) -> Dict:
"""extract window.ArgonautExchange payload"""
selector = Selector(response.text)
script = selector.xpath("//script[contains(text(),'window.ArgonautExchange')]/text()").get()
data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
data = json.loads(data["resi-property_search-experience-web"]["urqlClientCache"])
data = json.loads(list(data.values())[0]["data"])
return data
def parse_search_data(data: Dict) -> Dict:
"""reshape search payload"""
search_data = []
data = list(data.values())[0]
for listing in data["results"]["exact"]["items"]:
search_data.append(parse_property_data(listing["listing"]))
max_search_pages = data["results"]["pagination"]["maxPageNumberAvailable"]
return {"search_data": search_data, "max_search_pages": max_search_pages}
async def scrape_search(url: str, max_scrape_pages: int | None = None):
"""scrape property listings from search pages"""
first_page = await client.get(url)
assert first_page.status_code == 200, "request has been blocked"
print(f"scraping search page {url}")
data = parse_search_data(parse_hidden_data(first_page))
search_data = data["search_data"]
max_search_pages = data["max_search_pages"]
if max_scrape_pages and max_scrape_pages < max_search_pages:
max_search_pages = max_scrape_pages
print(f"scraping search pagination, remaining ({max_search_pages - 1} more pages)")
other_pages = [
client.get(str(first_page.url).split("/list")[0] + f"/list-{page}")
for page in range(2, max_search_pages + 1)
]
for response in asyncio.as_completed(other_pages):
response = await response
assert response.status_code == 200, "request has been blocked"
data = parse_search_data(parse_hidden_data(response))
search_data.extend(data["search_data"])
print(f"scraped ({len(search_data)}) from {url}")
return search_data
import re
import json
import jmespath
from typing import Dict, List
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")
def parse_property_data(data: Dict) -> Dict:
"""reuse property parser"""
return jmespath.search(
"""{
id: id,
propertyType: propertyType.display,
description: description,
propertyLink: _links.canonical.href,
address: address,
propertySizes: propertySizes,
generalFeatures: generalFeatures,
propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
images: media.images[].templatedUrl,
listingCompany: listingCompany.{name: name, phoneNumber: businessPhone},
listers: listers,
auction: auction
}""",
data,
)
def parse_hidden_data(response: ScrapeApiResponse) -> Dict:
"""extract window.ArgonautExchange payload"""
script = response.selector.xpath("//script[contains(text(),'window.ArgonautExchange')]/text()").get()
data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
data = json.loads(data["resi-property_search-experience-web"]["urqlClientCache"])
data = json.loads(list(data.values())[0]["data"])
return data
def parse_search_data(data: Dict) -> Dict:
"""reshape search payload"""
search_data = []
data = list(data.values())[0]
for listing in data["results"]["exact"]["items"]:
search_data.append(parse_property_data(listing["listing"]))
max_search_pages = data["results"]["pagination"]["maxPageNumberAvailable"]
return {"search_data": search_data, "max_search_pages": max_search_pages}
async def scrape_search(url: str, max_scrape_pages: int | None = None):
"""scrape search pages with ScrapFly"""
first_page = await SCRAPFLY.async_scrape(ScrapeConfig(url, country="AU", asp=True))
print(f"scraping search page {url}")
data = parse_search_data(parse_hidden_data(first_page))
search_data = data["search_data"]
max_search_pages = data["max_search_pages"]
if max_scrape_pages and max_scrape_pages < max_search_pages:
max_search_pages = max_scrape_pages
print(f"scraping search pagination, remaining ({max_search_pages - 1} more pages)")
other_pages = [
ScrapeConfig(
str(first_page.context["url"]).split("/list")[0] + f"/list-{page}",
country="AU",
asp=True,
)
for page in range(2, max_search_pages + 1)
]
async for response in SCRAPFLY.concurrent_scrape(other_pages):
data = parse_search_data(parse_hidden_data(response))
search_data.extend(data["search_data"])
print(f"scraped ({len(search_data)}) from {url}")
return search_data
Run the code
async def run():
data = await scrape_search(
url="https://www.realestate.com.au/buy/in-melbourne+-+northern+region,+vic/list-1",
max_scrape_pages=3
)
print(json.dumps(data, indent=2))
if __name__ == "__main__":
asyncio.run(run())
Sample search output
[
{
"id": "143029712",
"propertyType": "House",
"description": "Set in the sought-after Aurora Estate and in a prime location close to all amenities including the newly opened Aurora Village and Edgars Creek Secondary School...",
"propertyLink": "https://www.realestate.com.au/property-house-vic-wollert-143029712",
"address": {
"display": {
"shortAddress": "12 Geary Avenue",
"fullAddress": "12 Geary Avenue, Wollert, Vic 3750"
},
"suburb": "Wollert",
"state": "Vic",
"postcode": "3750"
},
"propertySizes": {
"building": {
"displayValue": "195.1"
},
"land": {
"displayValue": "331"
}
},
"generalFeatures": {
"bedrooms": {
"value": 4
},
"bathrooms": {
"value": 2
}
},
"listingCompany": {
"name": "Carvera Property",
"phoneNumber": "0466229631"
},
"listers": [
{
"name": "Chad Gamage",
"phoneNumber": {
"display": "0424876263"
}
}
]
}
]
How to Bypass Realestate.com.au Scraping Blocking
Blocking usually happens when TLS fingerprints look automated, requests come from outside Australia, or you send bursts without delays. ScrapFly hides those signals so you can focus on parsing data.
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
- Anti-bot protection bypass - scrape web pages without blocking!
- Rotating residential proxies - prevent IP address and geographic blocks.
- JavaScript rendering - scrape dynamic web pages through cloud browsers.
- Full browser automation - control browsers to scroll, input and click on objects.
- Format conversion - scrape as HTML, JSON, Text, or Markdown.
- Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.
Here is how to enable ScrapFly Anti Scraping Protection (ASP) and keep traffic inside Australia:
import httpx
from parsel import Selector
response = httpx.get("https://www.realestate.com.au/property-house-vic-tarneit-143160680")
selector = Selector(response.text)
# ScrapFly version
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient("YOUR SCRAPFLY API KEY")
result = client.scrape(ScrapeConfig(
"https://www.realestate.com.au/property-house-vic-tarneit-143160680",
country="AU",
asp=True,
cache=True,
debug=True,
))
selector = result.selector
FAQs
Now let's take a look at some frequently asked questions about realstate.com.au scraping.
How do I extract data from realestate.com.au's hidden JSON data?
Look for window.ArgonautExchange script tags in the page source. Parse the JSON data using json.loads() and navigate through the nested structure to access property details, search results, and pagination information.
What's the best way to handle realestate.com.au's anti-bot protection?
Use Australian residential proxies, implement realistic request delays, rotate user-agents, use headless browsers for JavaScript rendering, and consider anti-bot bypass services like ScrapFly to avoid detection.
How do I scrape multiple search pages from realestate.com.au?
Use the pagination parameter /list-{page_number} in URLs. Parse the maxPageNumberAvailable from the first page's JSON data to determine total pages, then scrape remaining pages concurrently.
Can I scrape historical property data from realestate.com.au?
Realestate.com.au primarily shows current listings. For historical data, you'd need to continuously scrape and store data over time, or look for property history APIs if available.
How do I handle rate limiting when scraping realestate.com.au at scale?
Implement delays between requests (2-5 seconds), use rotating proxies, distribute requests across multiple IP addresses, and consider using a scraping service that handles rate limiting automatically.
Are there alternatives for realestate.com.au?
Yes, there are alternative websites for real estate ads in Australia. Check out our tag #realestate for more options.
Summary
Realestate.com.au is a popular website for real estate ads in Australia, which can detect and block web scrapers.
In this article, we explained how to avoid realestate.com.au web scraping blocking. We also went through a step-by-step guide on creating a realestate.com.au scraper for property and search pages using Python. Which works by extracting the property listing data directly in JSON from the HTML.
Legal Disclaimer and Precautions
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens who are protected by GDPR.
- Do not repurpose the entire public datasets which can be illegal in some countries.