How to Scrape Realestate.com.au Property Listing Data

How to Scrape Realestate.com.au Property Listing Data

When it comes it comes to real estate websites in Australia, there are a few options and Realestate.com.au is biggest one. It's a popular website for real estate ads featuring thousands of different property listings across the country. However, it's a highly protected website, making it challenging to scrape.

In this article, we'll explain how to scrape realestate.com.au for real estate data from property and search pages. We'll also explain how to avoid realestate.com.au web scraping blocking. Let's dive in!

Latest Realestate.com.au Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Why Scrape Realestate.com.au?

Realestate.com.au includes thousands of property listing pages and manually navigating these pages can be a tedious and time-consuming task. Realestate.com.au scraping makes it easy to search and navigate through a significant amount of property listings in no time.

Web scraping realestate.com.au enables businesses, traders and buyers to analyze and study market trends, allowing for better market understanding and gaining a competitive edge, where they can make better decisions and take wise investment actions.

For more details, refer to our previous article on real estate web scraping use cases.

Project Setup

To scrape realestate.com.au, we'll use a few Python packages:

  • httpx: An HTTP client for sending requests to the website.
  • parsel: A parsing module for extracting data from the HTML using XPath and CSS selectors.
  • JMESPath: A module for parsing and refining JSON datasets.
  • asyncio: A module allows for running our web scraping code asynchronously.
  • scrapfly-sdk: Python SDK for ScrapFly, a web scraping API that allows for scraping at scale without getting blocked.

Since asyncio comes pre-installed in Python, you will only have to install the other libraries using the following pip command:

pip install httpx parsel jmespath scrapfly-sdk

How to Scrape Realestate.com.au Propety Pages

Let's begin by scraping property pages on realestate.com.au. Go to any property listing on the website like this property listing and you will get a page similar to this:

scrapfly middleware
Property listing page on realestate.com.au

Instead of parsing this page's data using selectors, we'll use the hidden web data method.

How to Scrape Hidden Web Data

Learn about hidden data, some common examples and how to scrape it using regular expressions and other clever parsing algorithms.

How to Scrape Hidden Web Data

To view this data, open the browser developer tools by clicking the F12 key to view the page HTML and scroll down to the script tag that starts with the window.ArgonautExchange text. You will see messy JSON data that looks like this after parsing:

hidden property data on developer tools
Property pages hidden web data

To scrape realtor.com.au property pages, we'll select this script and parse the inside JSON data:

Python
ScrapFly
import re
import json
import asyncio
import jmespath
from httpx import AsyncClient, Response
from parsel import Selector
from typing import List, Dict

client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser headers to mimize blocking chancesd
    headers={
        "accept-language": "en-US,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "accept-language": "en-US;en;q=0.9",
        "accept-encoding": "gzip, deflate, br",
    }
)

    
def parse_property_data(data: Dict) -> Dict:
    """refine property data from JSON"""
    if not data:
        return
    result = jmespath.search(
        """{
        id: id,
        propertyType: propertyType.display,
        description: description,            
        propertyLink: _links.canonical.href,
        address: address,
        propertySizes: propertySizes,
        generalFeatures: generalFeatures,
        propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
        images: media.images[].templatedUrl,
        videos: videos,
        floorplans: floorplans,        
        listingCompany: listingCompany.{name: name, id: id, companyLink: _links.canonical.href, phoneNumber: businessPhone, address: address.display.fullAddress, ratingsReviews: ratingsReviews, description: description},
        listers: listers,
        auction: auction
        }
        """,
        data,
    )
    return result

    
def parse_hidden_data(response: Response) -> Dict:
    """parse JSON data from script tag"""
    selector = Selector(response.text)
    script = selector.xpath(
        "//script[contains(text(),'window.ArgonautExchange')]/text()"
    ).get()
    # data needs to be parsed mutiple times
    data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
    data = json.loads(data["resi-property_listing-experience-web"]["urqlClientCache"])
    data = json.loads(list(data.values())[0]["data"])
    return data

    
async def scrape_properties(urls: List[str]) -> List[Dict]:
    """scrape listing data from property pages"""
    # add the property pages URLs to a scraping list
    to_scrape = [client.get(url) for url in urls]
    properties = []
    # scrape all the property pages concurrently
    for response in asyncio.as_completed(to_scrape):
        response = await response
        assert response.status_code == 200, "request has been blocked"
        data = parse_hidden_data(response)["details"]["listing"]
        data = parse_property_data(data)
        properties.append(data)
    print(f"scraped {len(properties)} property listings")
    return properties    
import re
import json
import jmespath
from typing import Dict, List
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")


def parse_property_data(data: Dict) -> Dict:
    """refine property data from JSON"""
    if not data:
        return
    result = jmespath.search(
        """{
        id: id,
        propertyType: propertyType.display,
        description: description,            
        propertyLink: _links.canonical.href,
        address: address,
        propertySizes: propertySizes,
        generalFeatures: generalFeatures,
        propertyFeatures: propertyFeatures[].{featureName: displayLabel, value: value},
        images: media.images[].templatedUrl,
        videos: videos,
        floorplans: floorplans,        
        listingCompany: listingCompany.{name: name, id: id, companyLink: _links.canonical.href, phoneNumber: businessPhone, address: address.display.fullAddress, ratingsReviews: ratingsReviews, description: description},
        listers: listers,
        auction: auction
        }
        """,
        data,
    )
    return result


def parse_hidden_data(response: ScrapeApiResponse) -> Dict:
    """parse JSON data from script tag"""
    selector = response.selector
    script = selector.xpath(
        "//script[contains(text(),'window.ArgonautExchange')]/text()"
    ).get()
    # data needs to be parsed mutiple times
    data = json.loads(re.findall(r"window.ArgonautExchange=(\{.+\});", script)[0])
    data = json.loads(data["resi-property_listing-experience-web"]["urqlClientCache"])
    data = json.loads(list(data.values())[0]["data"])
    return data


async def scrape_properties(urls: List[str]) -> List[Dict]:
    """scrape listing data from property pages"""
    # add the property pages URLs to a scraping list
    to_scrape = [ScrapeConfig(url, country="AU", asp=True) for url in urls]
    properties = []
    # scrape all the property pages concurrently
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        data = parse_hidden_data(response)["details"]["listing"]
        data = parse_property_data(data)
        properties.append(data)
    print(f"scraped {len(properties)} property listings")
    return properties    
Run the code
async def run():
    data = await scrape_properties(
        urls = [
            "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
            "https://www.realestate.com.au/property-house-vic-bundoora-141557712",
            "https://www.realestate.com.au/property-townhouse-vic-glenroy-143556608",
        ]
    )
    # print the data in JSON format
    print(json.dumps(data, indent=2))

if __name__ == "__main__":
    asyncio.run(run())

🙋‍ If you are getting errors while running the Python code tabs, this is likely due to getting blocked. Run the ScrapFly code tabs to avoid getting blocked.

In the above code, we use three functions, let's break them down:

  • parse_hidden_data() for extracting the JSON data from the script tag and parsing it as a valid JSON object.
  • parse_property_data() for refining the JSON data we got and excluding the unnecessary details.
  • scrape_properties() for scraping the property pages by adding the page URLs into a scraping list and scraping them concurrently.

Here is a sample output of the result we got:

Sample output
[
  {
    "id": "143160680",
    "propertyType": "House",
    "description": "Renowned Real Estate proudly presents this sensational opportunity with a luxury house in Tarneit.<br/><br/>This beautiful low maintenance home is situated in the well-established suburb of Tarneit.<br/>Suitable for Various Buyers: This property is ideal for young families, downsizers, and investors.<br/>Convenience: It's conveniently located a short distance from the Tarneit west shopping center, local parks, public transport, and well-known primary and secondary schools, including the Islamic College of Melbourne.<br/><br/>Spacious Layout: The house features four generous-sized bedrooms, 1 lounge, and a dining area.<br/>Master Bedroom: The master bedroom includes an ensuite and a walk-in robe for added convenience.<br/>Contemporary Kitchen: The kitchen is modern and overlooks the low maintenance backyard and formal lounge.<br/>Stainless Steel Appliances: It is equipped with stainless steel appliances and ample storage space.<br/><br/>Additional Features:<br/>➡️Open plan living area.<br/>➡️Designated meals area connected with the kitchen and formal lounge.<br/>➡️Ducted heating .<br/>➡️Split system air conditioning in the formal lounge.<br/>➡️Low maintenance front and backyard.<br/><br/>Contact Information: For more information and to schedule an inspection, please contact Himraj at 0452060566",
    "propertyLink": "https://www.realestate.com.au/property-house-vic-tarneit-143160680",
    "address": {
      "suburb": "Tarneit",
      "state": "Vic",
      "postcode": "3029",
      "display": {
        "shortAddress": "28 Chantelle Parade",
        "__typename": "AddressDisplay",
        "fullAddress": "28 Chantelle Parade, Tarneit, Vic 3029",
        "geocode": {
          "latitude": -37.85273078,
          "longitude": 144.66332821,
          "__typename": "GeocodeDisplay"
        }
      },
      "__typename": "Address"
    },
    "propertySizes": {
      "building": null,
      "land": {
        "displayValue": "336",
        "sizeUnit": {
          "displayValue": "m²",
          "__typename": "PropertySizeUnit"
        },
        "__typename": "PropertySize"
      },
      "preferred": {
        "sizeType": "LAND",
        "size": {
          "displayValue": "336",
          "sizeUnit": {
            "displayValue": "m²",
            "__typename": "PropertySizeUnit"
          },
          "__typename": "PropertySize"
        },
        "__typename": "PreferredPropertySize"
      },
      "__typename": "PropertySizes"
    },
    "generalFeatures": {
      "bedrooms": {
        "value": 4,
        "__typename": "IntValue"
      },
      "bathrooms": {
        "value": 2,
        "__typename": "IntValue"
      },
      "parkingSpaces": {
        "value": 2,
        "__typename": "IntValue"
      },
      "studies": {
        "value": 0,
        "__typename": "IntValue"
      },
      "__typename": "GeneralFeatures"
    },
    "propertyFeatures": [
      {
        "featureName": "Built-in wardrobes",
        "value": null
      },
      {
        "featureName": "Dishwasher",
        "value": null
      },
      {
        "featureName": "Ducted heating",
        "value": null
      },
      {
        "featureName": "Ensuites",
        "value": {
          "__typename": "NumericFeatureValue",
          "displayValue": "1"
        }
      },
      {
        "featureName": "Evaporative cooling",
        "value": null
      },
      {
        "featureName": "Floorboards",
        "value": null
      },
      {
        "featureName": "Fully fenced",
        "value": null
      },
      {
        "featureName": "Garage spaces",
        "value": {
          "__typename": "NumericFeatureValue",
          "displayValue": "2"
        }
      },
      {
        "featureName": "Land size",
        "value": {
          "__typename": "MeasurementFeatureValue",
          "displayValue": "336",
          "sizeUnit": {
            "id": "SQUARE_METRES",
            "displayValue": "m²",
            "__typename": "PropertySizeUnit"
          }
        }
      },
      {
        "featureName": "Living areas",
        "value": {
          "__typename": "NumericFeatureValue",
          "displayValue": "1"
        }
      },
      {
        "featureName": "Remote garage",
        "value": null
      },
      {
        "featureName": "Secure parking",
        "value": null
      },
      {
        "featureName": "Solar panels",
        "value": null
      },
      {
        "featureName": "Toilets",
        "value": {
          "__typename": "NumericFeatureValue",
          "displayValue": "2"
        }
      }
    ],
    "images": [
      "https://i2.au.reastatic.net/{size}/d8d3607342301e4e1b5b4cb84e3fc3d8cf48849a6311dd38e44bf3977fc593d8/image.jpg",
      "https://i2.au.reastatic.net/{size}/7d26afd862a3d1d58501a724c3532493c4fa7cd2bd297b2ab334039fd40e6c9c/image.jpg",
      "https://i2.au.reastatic.net/{size}/cbd580874f3f6aedbf263d77b6de3d0e5e2504925f72502b12838b8228cfdd45/image.jpg",
      "https://i2.au.reastatic.net/{size}/12d8b6d3bb5eb40170647f1b81839156eb8526b4c05392158bdbcc6e362a60af/image.jpg",
      "https://i2.au.reastatic.net/{size}/c4658347028f409f3e694de3c11d8c84644d5ee4229187cc418bccc26c93dfb7/image.jpg",
      "https://i2.au.reastatic.net/{size}/303f8e158603d35ea3c945c5839b437a1548cebec2b7a81eb9bad67593dcc603/image.jpg",
      "https://i2.au.reastatic.net/{size}/520ad964d73b7e386c607fc052741ab5fc3b01a2b7b72dc326e614d09bc2d3a5/image.jpg",
      "https://i2.au.reastatic.net/{size}/2ac18df655fa961410a2e80d239006ba3860732f1a26d0df4b1f5e51486662f2/image.jpg",
      "https://i2.au.reastatic.net/{size}/f53337ce77b54ab95b1a5ea4f679550224defcacdf2344ae8652680382c424cb/image.jpg",
      "https://i2.au.reastatic.net/{size}/5249ce376abccad84d0b4f3ce3254579761b4aaffc0ef09c587cf884e6008efc/image.jpg",
      "https://i2.au.reastatic.net/{size}/a740d6d1e484c3ae3c51b3670f02a967929ad61771383332998271f69050460c/image.jpg",
      "https://i2.au.reastatic.net/{size}/cc1255b415aaee3c4ea82a12aaf653141614dc0297ffe434726a82aeed4b6f75/image.jpg"
    ],
    "videos": null,
    "floorplans": null,
    "listingCompany": {
      "name": "Renowned Real Estate - CRAIGIEBURN",
      "id": "PGCQAA",
      "companyLink": "https://www.realestate.com.au/agency/renowned-real-estate-craigieburn-PGCQAA?cid={cid}",
      "phoneNumber": "0452060566",
      "address": "9 Gauja Street, CRAIGIEBURN, VIC 3064",
      "ratingsReviews": {
        "avgRating": null,
        "totalReviews": 0,
        "__typename": "AgencyRatingsReviews"
      },
      "description": null
    },
    "listers": [
      {
        "id": "3307736",
        "name": "Him Raj Parajuli",
        "photo": {
          "templatedUrl": "https://i2.au.reastatic.net/{size}/03527ad948f2ec46b10b220c44fa1007b0dc0eded8119733c9135b0be21547f8/main.jpg",
          "__typename": "Image"
        },
        "phoneNumber": {
          "display": "0452060566",
          "showDisclaimer": false,
          "__typename": "PhoneNumber"
        },
        "_links": {
          "canonical": {
            "href": "https://www.realestate.com.au/agent/him-raj-parajuli-3307736?cid={cid}",
            "__typename": "AbsoluteLinks"
          },
          "__typename": "ListerLinks"
        },
        "__typename": "Lister",
        "agentId": null,
        "jobTitle": "OIEC/Director",
        "showInMediaViewer": false,
        "listerRatingsReviews": {
          "avgRating": null,
          "totalReviews": 0,
          "__typename": "ListerRatingsReviews"
        }
      },
      {
        "id": "3307760",
        "name": "Aman Pakhrin",
        "photo": {
          "templatedUrl": "https://i2.au.reastatic.net/{size}/6b365a8a0ffa9ec976671759a15d136b796ba44f8b973a105b8aabac7ca857e9/main.jpg",
          "__typename": "Image"
        },
        "phoneNumber": {
          "display": "0450939749",
          "showDisclaimer": false,
          "__typename": "PhoneNumber"
        },
        "_links": {
          "canonical": {
            "href": "https://www.realestate.com.au/agent/aman-pakhrin-3307760?cid={cid}",
            "__typename": "AbsoluteLinks"
          },
          "__typename": "ListerLinks"
        },
        "__typename": "Lister",
        "agentId": null,
        "jobTitle": "Sales Director",
        "showInMediaViewer": false,
        "listerRatingsReviews": {
          "avgRating": null,
          "totalReviews": 0,
          "__typename": "ListerRatingsReviews"
        }
      }
    ],
    "auction": null
  }
]

Our realestate.com.au scraper can successfully scrape property pages. Let's scrape search pages so we can discover properties according to our preferences next!

How to Scrape Realestate.com.au Search Pages

Just like property pages, we can find the search page data as JSON under script tags. To see this data, let's take the same approach we did earlier. Search for any properties on the website, inspect the page HTML using developer tools and scroll down to the script tag with the text window.ArgonautExchange.

After parsing the data inside the script tag, the data should look like this:

search pages hidden web data
Search pages hidden web data

The URL used for the above search page is the following:

https://www.realestate.com.au/buy/in-melbourne+-+northern+region,+vic/list-1

The parameter /list-1 represents the search page number. We'll use it within our scraper to scrape multiple search pages:

Python
ScrapFly
import re
import json
import asyncio
import jmespath
from httpx import AsyncClient, Response
from parsel import Selector
from typing import List, Dict

client = AsyncClient(
    # the remaining client config
)

def parse_property_data(data: Dict) -> Dict:
    """refine property data from JSON"""
    # the rest of the function

def parse_hidden_data(response: Response) -> Dict:
    """parse JSON data from script tag"""
    # the rest of the function

def parse_search_data(data: List[Dict]) -> List[Dict]:
    """refine search data"""
    search_data = []
    data = list(data.values())[0]
    for listing in data["results"]["exact"]["items"]:
        # refine each property listing in the search results
        search_data.append(parse_property_data(listing["listing"]))
    max_search_pages = data["results"]["pagination"]["maxPageNumberAvailable"]
    return {"search_data": search_data, "max_search_pages": max_search_pages}


async def scrape_search(url: str, max_scrape_pages: int = None):
    """scrape property listings from search pages"""
    first_page = await client.get(url)
    assert first_page.status_code == 200, "request has been blocked"
    print(f"scraping search page {url}")
    data = parse_hidden_data(first_page)
    data = parse_search_data(data)
    search_data = data["search_data"]
    # get the number of maximum search pages
    max_search_pages = data["max_search_pages"]
    # scrape all available pages if not max_scrape_pages or max_scrape_pages > max_search_pages
    if max_scrape_pages and max_scrape_pages < max_search_pages:
        max_scrape_pages = max_scrape_pages
    else:
        max_scrape_pages = max_search_pages
    print(f"scraping search pagination, remaining ({max_scrape_pages - 1} more pages)")
    # add the remaining search pages in a scraping list
    other_pages = [client.get(str(first_page.url).split("/list")[0] + f"/list-{page}") for page in max_scrape_pages + 1]
    # scrape the remaining search pages concurrently
    for response in asyncio.as_completed(other_pages):
        response = await response
        assert response.status_code == 200, "request has been blocked"
        data = parse_hidden_data(response)
        search_data.extend(parse_search_data(data)["search_data"])
    print(f"scraped ({len(search_data)}) from {url}")
    return search_data    
import re
import json
import jmespath
from typing import Dict, List
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_property_data(data: Dict) -> Dict:
    """refine property data from JSON"""
    # the rest of the function

def parse_hidden_data(response: ScrapeApiResponse) -> Dict:
    """parse JSON data from script tag"""
    # the rest of the function

def parse_search_data(data: List[Dict]) -> List[Dict]:
    """refine search data"""
    search_data = []
    data = list(data.values())[0]
    for listing in data["results"]["exact"]["items"]:
        # refine each property listing in the search results
        search_data.append(parse_property_data(listing["listing"]))
    max_search_pages = data["results"]["pagination"]["maxPageNumberAvailable"]
    return {"search_data": search_data, "max_search_pages": max_search_pages}


async def scrape_search(url: str, max_scrape_pages: int = None):
    """scrape property listings from search pages"""
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(url, country="AU", asp=True))
    print(f"scraping search page {url}")
    data = parse_hidden_data(first_page)
    data = parse_search_data(data)
    search_data = data["search_data"]
    # get the number of maximum search pages
    max_search_pages = data["max_search_pages"]
    # scrape all available pages if not max_scrape_pages or max_scrape_pages > max_search_pages
    if max_scrape_pages and max_scrape_pages < max_search_pages:
        max_scrape_pages = max_scrape_pages
    else:
        max_scrape_pages = max_search_pages
    print(f"scraping search pagination, remaining ({max_scrape_pages - 1} more pages)")
    # add the remaining search pages in a scraping list
    other_pages = [
        ScrapeConfig(
            str(first_page.context["url"]).split("/list")[0] + f"/list-{page}",
            country="AU", asp=True
        )
        for page in range(2, max_scrape_pages + 1)
    ]
    # scrape the remaining search pages concurrently
    async for response in SCRAPFLY.concurrent_scrape(other_pages):
        data = parse_hidden_data(response)
        search_data.extend(parse_search_data(data)["search_data"])
    print(f"scraped ({len(search_data)}) from {url}")
    return search_data    
Run the code
async def run():
    data = await scrape_search(
        url="https://www.realestate.com.au/buy/in-melbourne+-+northern+region,+vic/list-1",
        max_scrape_pages=3
    )
    # print the data in JSON format
    print(json.dumps(data, indent=2))

if __name__ == "__main__":
    asyncio.run(run())    

This code is almost the same as the previous one, but we added two new functions:

  • parse_search_data() to refine the search we got using the JMESPath we created earlier.
  • scrape_search() to crawl over search pages by scraping the first search first then scraping the remaining search pages concurrently.

The result is a list containing property listings found on three search pages, similar to this:

Sample output
[
  {
    "id": "143029712",
    "propertyType": "House",
    "description": "Set in the sought-after Aurora Estate and in a prime location close to all amenities including the newly opened Aurora Village and Edgars Creek Secondary School, Epping plaza, Northern Hospital and easy freeway access, everything you need is just a stone’s throw away!<br/><br/>This spacious home comprises of four generous sized bedrooms all with built in robes (master with walk-in robe and full en-suite), light filled kitchen with 900mm stainless steel appliances, stone benchtops, open plan generous sized meals/living area, multiple living zones, central bathroom with separate shower/bath and stone benchtop, ample storage space, ducted heating, alarm system, double garage with internal access and low maintenance front and rear yards.<br/><br/>This home is sure to impress, inspections will not disappoint!<br/><br/>What's more to love?<br/>- Low maintenance<br/>- 900mm stainless steel appliances<br/>- Evaporative cooling<br/>- Central heating<br/>- Multiple living zones<br/><br/>POTENTIAL RENTAL INCOME: $550 A WEEK",
    "propertyLink": "https://www.realestate.com.au/property-house-vic-wollert-143029712",
    "address": {
      "display": {
        "shortAddress": "12 Geary Avenue",
        "fullAddress": "12 Geary Avenue, Wollert, Vic 3750",
        "__typename": "AddressDisplay"
      },
      "suburb": "Wollert",
      "state": "Vic",
      "postcode": "3750",
      "__typename": "Address"
    },
    "propertySizes": {
      "building": {
        "displayValue": "195.1",
        "sizeUnit": {
          "displayValue": "m²",
          "__typename": "PropertySizeUnit"
        },
        "__typename": "PropertySize"
      },
      "land": {
        "displayValue": "331",
        "sizeUnit": {
          "displayValue": "m²",
          "__typename": "PropertySizeUnit"
        },
        "__typename": "PropertySize"
      },
      "preferred": {
        "sizeType": "LAND",
        "size": {
          "displayValue": "331",
          "sizeUnit": {
            "displayValue": "m²",
            "__typename": "PropertySizeUnit"
          },
          "__typename": "PropertySize"
        },
        "__typename": "PreferredPropertySize"
      },
      "__typename": "PropertySizes"
    },
    "generalFeatures": {
      "bedrooms": {
        "value": 4,
        "__typename": "IntValue"
      },
      "bathrooms": {
        "value": 2,
        "__typename": "IntValue"
      },
      "parkingSpaces": {
        "value": 2,
        "__typename": "IntValue"
      },
      "studies": {
        "value": 0,
        "__typename": "IntValue"
      },
      "__typename": "GeneralFeatures"
    },
    "propertyFeatures": null,
    "images": [
      "https://i2.au.reastatic.net/{size}/a69720736c21a81214fb1ae5f2469bf22cd3cd90967f650013536bcb5cc00094/image.jpg",
      "https://i2.au.reastatic.net/{size}/ffa1c7249947822b15a3c59a7b939792310922152aeebed7b8166fc6e1dca217/image.jpg",
      "https://i2.au.reastatic.net/{size}/9f4256aecccc71331d7b8aab9a2bca15760c4e054e76290e2cf26850f260a2d3/image.jpg",
      "https://i2.au.reastatic.net/{size}/fa5c52de77979f2d972b4382f45d21b89231b50c7687820941452ce8928bb69b/image.jpg",
      "https://i2.au.reastatic.net/{size}/cebccbfd72ca5cb0c24161540b298cf6985532b87cc89210e41b6301eb008b77/image.jpg",
      "https://i2.au.reastatic.net/{size}/0bbc9779f0ce181bf8138cddeec69e9e25639ac45eabfdd4c60a99f795c07065/image.jpg",
      "https://i2.au.reastatic.net/{size}/4e18b9cd82baf5b68855edd9a247d6ba032f0099a79ed17924ae3fe11ab0db32/image.jpg",
      "https://i2.au.reastatic.net/{size}/862f6671e3fb644655f0385b0b8b55bd8fd17458def73afbbda4648e1cd89072/image.jpg",
      "https://i2.au.reastatic.net/{size}/af79d30f3a6a4387c71be878db32a7383b62d8bf0ab8da92a4567658756352cd/image.jpg",
      "https://i2.au.reastatic.net/{size}/e7da34de1128125377c71883fedd6288ef1c65543723e16049b6c327a5e2a324/image.jpg",
      "https://i2.au.reastatic.net/{size}/269a976c3c0a2a0273e1b47139c3861a3653e957b83aea3262bcd5f2a7541313/image.jpg",
      "https://i2.au.reastatic.net/{size}/4f9f311dc06ffee98c7c5da82e07d26e1f80e5a17819e5bd44e53c888c01224e/image.jpg"
    ],
    "videos": null,
    "floorplans": null,
    "listingCompany": {
      "name": "Carvera Property",
      "id": "ORNIKX",
      "companyLink": "https://www.realestate.com.au/agency/carvera-property-ORNIKX?cid={cid}",
      "phoneNumber": "0466229631",
      "address": "G01/6-8 Montrose St, HAWTHORN EAST, VIC 3123",
      "ratingsReviews": {
        "avgRating": 5,
        "totalReviews": 23,
        "__typename": "AgencyRatingsReviews"
      },
      "description": null
    },
    "listers": [
      {
        "id": "3084543",
        "name": "Chad Gamage",
        "photo": {
          "templatedUrl": "https://i2.au.reastatic.net/{size}/f8a10fa6c4ce2df0d8901c087ece63b07a32fc21362d73c6702e9fc65090d780/main.jpg",
          "__typename": "Image"
        },
        "phoneNumber": {
          "display": "0424876263",
          "showDisclaimer": false,
          "__typename": "PhoneNumber"
        },
        "_links": {
          "canonical": {
            "href": "https://www.realestate.com.au/agent/chad-gamage-3084543?cid={cid}",
            "__typename": "AbsoluteLinks"
          },
          "__typename": "ListerLinks"
        },
        "__typename": "Lister",
        "agentId": null,
        "jobTitle": "Sales Manager",
        "showInMediaViewer": false
      },
      {
        "id": "3243944",
        "name": "Stalon Ablahad",
        "photo": {
          "templatedUrl": "https://i2.au.reastatic.net/{size}/e8b77f7268a0aa114c0f3d0caed4392e4b06d13978a11644527bcf4a2cf39da5/main.jpg",
          "__typename": "Image"
        },
        "phoneNumber": {
          "display": "0466659650",
          "showDisclaimer": false,
          "__typename": "PhoneNumber"
        },
        "_links": {
          "canonical": {
            "href": "https://www.realestate.com.au/agent/stalon-ablahad-3243944?cid={cid}",
            "__typename": "AbsoluteLinks"
          },
          "__typename": "ListerLinks"
        },
        "__typename": "Lister",
        "agentId": null,
        "jobTitle": "Sales Executive",
        "showInMediaViewer": false
      }
    ],
    "auction": null
  }
]

We can successfully scrape real estate listing data from realestate.com.au search and property pages. However, our scraper will likely get blocked after sending a few additional requests. Let's take a look at a solution!

How to Bypass Realestate.com.au Scraping Blocking

To bypass web scraping blocking, we need to pay attention to several details, including IP address, TLS handshakes, headers and cookies. This is where Scrapfly can lend you a hand!

scrapfly middleware

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

For example, here is how we can use the ScrapFly asp feature to scrape realestate.com.au without getting blocked:

import httpx
from parsel import Selector

response = httpx.get("some realestate.com.au url")
selector = Selector(response.text)

# in ScrapFly SDK becomes
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient("Your ScrapFly API key")

result = client.scrape(ScrapeConfig(
    "some realestate.com.au url",
    # set the proxy location to australia
    country="AU",
    # enable the anti scraping protection bypass
    asp=True
))
selector = result.selector

Sign-up for FREE to get you API key!

FAQ

To wrap up this guide, let's take a look at some frequently asked questions.

Scraping publicly available real estate data is legal however it should be confirmed with the Terms of Service agreement if it applies to you and your use case. For more see our web scraping legality page.

Is there a public API for realestate.com.au?

At the time of writing, there is no public API available for realestate.com.au. However, scraping realestate.com.au is straightforward and you can use it to create your own web scraping API.

Are there alternatives for realestate.com.au?

Yes, there are alternative websites for real estate ads in Australia. Check out our tag #realestate for more options.

Latest Realestate.com.au Scraper Code
https://github.com/scrapfly/scrapfly-scrapers/

How to Scrape Realestate.com.au - Summary

Realestate.com.au is a popular website for real estate ads in Australia, which can detect and block web scrapers.

In this article, we explained how to avoid realestate.com.au web scraping blocking. We also went through a step-by-step guide on creating a realestate.com.au scraper for property and search pages using Python. Which works by extracting the property listing data directly in JSON from the HTML.

Related Posts

Playwright Examples for Web Scraping and Automation

Learn Playwright with Python and JavaScript examples for automating browsers like Chromium, WebKit, and Firefox.

How to use wget in Python

Learn how to use wget in Python through subprocess calls and what are other options.

Ultimate Guide to JSON Parsing in Python

Learn JSON parsing in Python with this ultimate guide. Explore basic and advanced techniques using json, and tools like ijson and nested-lookup