How to Scrape Zillow Real Estate Property Data in Python

by Bernardas Ališauskas Nov 21, 2024

#scrapeguide #python #real-estate

How to Scrape Zillow Real Estate Property Data in Python

In this web scraping tutorial, we'll look at how to scrape Zillow property data - the biggest real estate marketplace in the United States.

In this Zillow data scraper, we'll extract real estate data, including rent and sale property information, such as prices, addresses, photos, and other website details. We'll start with a brief overview of how the website works and how to navigate it. Then, we'll explain how to use its search system for effective Zillow real estate data discovery, and finally, we'll extract the full property details. Let's get started!

Latest Zillow.com Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Legal Disclaimer and Precautions

This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:

Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow in web scraping and for more you should consult a lawyer.

Why Scrape Zillow?

Zillow.com contains a massive real estate dataset: prices, locations, contact information, etc. This is valuable information for market analytics, the study of the housing industry, and a general competitor overview.

This means that by web scraping Zillow, we have access to the biggest real estate market in the US!

For further details on data scraping use cases, refer to our extensive guide.

Project Setup

In this tutorial, we'll scrape Zillow using Python with two community packages:

httpx - HTTP client library to get Zillow data in either HTML or JSON.
parsel - HTML parsing library to parse our web scraped HTML files.

Optionally, we'll also use loguru, a logging library that will allow us to track our Zillow data scraper.
These packages can be installed using the following pip command:

$ pip install httpx parsel loguru

Alternatively, feel free to replace httpx with any other HTTP client package, such as requests, as we'll only send basic HTTP requests. As for parsel, another great alternative is the beautifulsoup package.

Web Scraping with Python

Introduction tutorial to web scraping with Python. How to collect and parse public data. Challenges, best practices and an example project.

How to Scrape Zillow Property Pages?

To start, let's explore scraping Zillow data from property pages. First, let's locate the data on the HTML from a given Zillow page, like this one.

To scrape this page data, we can parse every detail using XPath or CSS selectors. However, there is a better approach: hidden web data. To find this data, follow the below steps:

Open the browser developer tools by pressing the F12 key.
Search for the selector //script[@id='__NEXT_DATA__'].

After following the above steps, you will find the property dataset hidden in the JavaScript variable with the above XPath selector:

capture of page source of Zillow's property page — We can see property data is available as JSON object in a script tag

The above real estate data is the same on the page but before getting rendered into the HTML, commonly known as hidden web data.

How to Scrape Hidden Web Data

The visible HTML doesn't always represent the whole dataset available on the page. In this article, we'll be taking a look at scraping of hidden web data. What is it and how can we scrape it using Python?

Let's power our Zillow data scraper with requesting and parsing logic for property pages:

Python

ScrapFly

import asyncio
from typing import List
import httpx
import json
from parsel import Selector

client = httpx.AsyncClient(
    # enable http2
    http2=True,
    # add basic browser like headers to prevent being blocked
    headers={
        "accept-language": "en-US,en;q=0.9",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "accept-language": "en-US;en;q=0.9",
        "accept-encoding": "gzip, deflate, br",
    },
)

async def scrape_properties(urls: List[str]):
    """scrape zillow property pages for property data"""
    to_scrape = [client.get(url) for url in urls]
    results = []
    for response in asyncio.as_completed(to_scrape):
        response = await response
        assert response.status_code == 200, "request has been blocked"
        selector = Selector(response.text)
        data = selector.css("script#__NEXT_DATA__::text").get()
        if data:
            # Option 1: some properties are located in NEXT DATA cache
            data = json.loads(data)
            property_data = json.loads(data["props"]["pageProps"]["componentProps"]["gdpClientCache"])
            property_data = property_data[list(property_data)[0]]['property']
        else:
            # Option 2: other times it's in Apollo cache
            data = selector.css("script#hdpApolloPreloadedData::text").get()
            data = json.loads(json.loads(data)["apiCache"])
            property_data = next(
                v["property"] for k, v in data.items() if "ForSale" in k
            )
        results.append(property_data)
    return results

import asyncio
import json
from typing import List
from scrapfly import ScrapeConfig, ScrapflyClient

scrapfly = ScrapflyClient(key="Your ScrapFly API key")

async def scrape_properties(urls: List[str]):
    """scrape zillow property pages for property data"""
    to_scrape = [ScrapeConfig(url, asp=True, country="US") for url in urls]
    results = []
    async for result in scrapfly.concurrent_scrape(to_scrape):
        data = result.selector.css("script#__NEXT_DATA__::text").get()
        if data:
            # Option 1: some properties are located in NEXT DATA cache
            data = json.loads(data)
            property_data = json.loads(data["props"]["pageProps"]["componentProps"]["gdpClientCache"])
            property_data = property_data[list(property_data)[0]]['property']
        else:
            # Option 2: other times it's in Apollo cache
            data = result.selector.css("script#hdpApolloPreloadedData::text").get()
            data = json.loads(json.loads(data)["apiCache"])
            property_data = next(v["property"] for k, v in data.items() if "ForSale" in k)
        results.append(property_data)
    return results

Run the code

async def run():
    data = await scrape_properties(
            ["https://www.zillow.com/homedetails/1625-E-13th-St-APT-3K-Brooklyn-NY-11229/245001606_zpid/"]
        )
    print(json.dumps(data, indent=2))
if __name__ == "__main__":
    asyncio.run(run())

Let's break down the above code for scraping Zillow. We start by defining an httpx client with standard browser headers to avoid blocking. Then, we define a scrape_properties function, which does the following:

Add the Zillow property page URLs to a scraping list.
Request the page URLs concurrently to get the data as HTML pages.
Parse each page HTML for the script tag with the JSON data.
Let's run this property scraper and see the results it generates:

Here is what the extracted data from Zillow looks like:

Example Output

[
  {
    "address": {
      "streetAddress": "1065 2nd Ave",
      "city": "New York",
      "state": "NY",
      "zipcode": "10022",
      "__typename": "Address",
      "neighborhood": null
    },
    "description": "Inspired by Alvar Aaltos iconic vase, Aalto57s sculptural architecture reflects classic concepts of design both inside and out. Each residence in this boutique rental building features clean modern finishes. Amenities such as a landscaped terrace with gas grills, private and group dining areas, sun loungers, and fire feature as well as an indoor rock climbing wall, basketball court, game room, childrens playroom, guest suite, and a fitness center make Aalto57 a home like no other.",
    "photos": [
      "https://photos.zillowstatic.com/fp/0c1099a1882a904acc8cedcd83ebd9dc-p_d.jpg",
      "..."
    ],
    "zipcode": "10022",
    "phone": "646-681-3805",
    "name": "Aalto57",
    "floor_plans": [
      {
        "zpid": "2096631846",
        "__typename": "FloorPlan",
        "availableFrom": "1657004400000",
        "baths": 1,
        "beds": 1,
        "floorPlanUnitPhotos": [],
        "floorplanVRModel": null,
        "maxPrice": 6200,
        "minPrice": 6200,
        "name": "1 Bed/1 Bath-1D",
        ...
      }
    ...
  ]
}]

Cool! Our Zillow scraper can extract various details from the property web pages, including price, address, photos, and property structure. Next, let's explore extracting data from search pages!

How to Find Zillow Properties

Our previous code for scraping Zillow can extract data from a property page. In this section, we'll explore finding real estate listings using Zillow's search bar. Here is how the search system works under the hood:

Inspecting Zillow's search functionality with Chrome Dev tools (accessed via F12 key)

Above, we can see that upon submitting a search query, a background request is sent to Zillow API for search. The search query includes the map coordinates, as well as other comprehensive details. However, few query parameters are actually required:

{
  "searchQueryState":{
    "pagination":{},
    "usersSearchTerm":"New Haven, CT",
    "mapBounds":
      {
        "west":-73.03037621240235,
        "east":-72.82781578759766,
        "south":41.23043771298298,
        "north":41.36611033618769
      },
    },
  "wants": {
    "cat1":["mapResults"]
  },
  "requestId": 2
}

The Zillow search API is really powerful and allows us to find listings in any map area defined by two location points comprised of 4 direction values: north, west, south, and east:

illustration of drawing areas on maps using only two points — with these 4 values we can draw a square or a circle area at any point of the map!

Let's replicate the login for finding properties by location to our Zillow scraping code using the latitude and longitude values:

Python

ScrapFly

import json
import httpx

# we should use browser-like request headers to prevent being instantly blocked
BASE_HEADERS = {
    "accept-language": "en-US,en;q=0.9",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "accept-language": "en-US;en;q=0.9",
    "accept-encoding": "gzip, deflate, br",
}


url = "https://www.zillow.com/async-create-search-page-state"
body = {
    "searchQueryState": {
        "pagination": {},
        "usersSearchTerm": "New Haven, CT",
        # map coordinates that indicate New Haven city's area
        "mapBounds": {
            "west": -73.03037621240235,
            "east": -72.82781578759766,
            "south": 41.23043771298298,
            "north": 41.36611033618769,
        },
    },
    "wants": {"cat1": ["listResults", "mapResults"], "cat2": ["total"]},
    "requestId": 2,
}
response = httpx.put(url, headers=BASE_HEADERS, data=json.dumps(body))
assert response.status_code == 200, "request has been blocked"
data = response.json()
results = response.json()["cat1"]["searchResults"]["mapResults"]
print(json.dumps(results, indent=2))
print(f"found {len(results)} property results")

import json
from scrapfly import ScrapeConfig, ScrapflyClient

scrapfly = ScrapflyClient(key="Your ScrapFly API key")

url = "https://www.zillow.com/async-create-search-page-state"
body = {
    "searchQueryState": {
        "pagination": {},
        "usersSearchTerm": "New Haven, CT",
        # map coordinates that indicate New Haven city's area
        "mapBounds": {
            "west": -73.03037621240235,
            "east": -72.82781578759766,
            "south": 41.23043771298298,
            "north": 41.36611033618769,
        },
    },
    "wants": {"cat1": ["listResults", "mapResults"], "cat2": ["total"]},
    "requestId": 2,
}

response = scrapfly.scrape(
    ScrapeConfig(
        url,
        asp=True,
        country="US",
        headers={"content-type": "application/json"},
        body=json.dumps(body),
        method="PUT",
    )
)

data = json.loads(response.content)
results = data["cat1"]["searchResults"]["mapResults"]
print(json.dumps(results, indent=2))
print(f"found {len(results)} property results")

We can successfully replicate the search query precisely. Next, we'll utilize it for the search pages.

How to Scrape Zillow Search Pages?

To scrape Zillow search, we need the geographical location details, which can be challenging to get. Therefore, we'll extract the location's geographical details from an easier user interface: search pages. To illustrate this, go to any search URL on Zillow, like zillow.com/homes/New-Haven,-CT_rb/. You fill find the geographical details hidden in the HTML:

capture of page source of Zillow's search pager — We can see query and geo data of this search hidden in a page source comment

The geographical details exist in the script tag. Let's use it to scrape Zillow data from search pages:

Python

ScrapFly

import json
import httpx
import random
import asyncio

from loguru import logger as log
from parsel import Selector


def create_search_payload(
    query_data: dict, filters: dict = None, page_number: int = None
):
    """create a search payload for Zillow's search API"""
    payload = {
        "searchQueryState": query_data,
        "wants": {"cat1": ["listResults", "mapResults"], "cat2": ["total"]},
        "requestId": random.randint(2, 10),
    }
    if filters:
        query_data["filterState"] = filters
    if page_number:
        payload["searchQueryState"]["pagination"] = {"currentPage": page_number}
    return json.dumps(payload)


async def _search(
    query: str,
    session: httpx.AsyncClient,
    filters: dict = None,
    max_scrape_pages: int = None,
):
    """base search function which is used by sale and rent search functions"""
    html_response = await session.get(f"https://www.zillow.com/homes/{query}_rb/")
    assert html_response.status_code == 403, "request is blocked"

    selector = Selector(html_response.text)
    # find query data in script tags
    try:
        script_data = json.loads(
            selector.xpath("//script[@id='__NEXT_DATA__']/text()").get()
        )    
    except:
        log.error("request is blocked, use Scrapfly code tab")
        return 

    query_data = script_data["props"]["pageProps"]["searchPageState"]["queryState"]

    # scrape search API
    url = "https://www.zillow.com/async-create-search-page-state"
    search_data = []

    api_response = await session.put(
        url,
        headers={"content-type": "application/json"},
        body=create_search_payload(query_data=query_data, filters=filters),
    )
    data = api_response.json()
    property_data = data["cat1"]["searchResults"]["listResults"]
    search_data.extend(property_data)
    _total_pages = data["cat1"]["searchList"]["totalPages"]

    # if no pagination data, return
    if _total_pages == 1:
        log.success(f"scraped {len(search_data)} properties from search pages")
        return search_data

    # else paginate remaining pages
    if max_scrape_pages and max_scrape_pages < _total_pages:
        _total_pages = max_scrape_pages

    log.info(f"scraping search pagination, {_total_pages} more pages remaining")
    to_scrape = [
        await session.put(
            url,
            headers={"content-type": "application/json"},
            body=create_search_payload(
                query_data=query_data, filters=filters, page_number=page
            ),
        )
        for page in range(2, _total_pages + 1)
    ]

    for response in asyncio.as_completed(to_scrape):
        response = await response
        data = api_response.json()
        property_data = data["cat1"]["searchResults"]["listResults"]
        search_data.extend(property_data)

    log.success(f"scraped {len(search_data)} properties from search pages")
    return search_data


# Example usages 👇
async def search_sale(query: str, session: httpx.AsyncClient):
    """search properties that are for sale"""
    log.info(f"scraping sale search for: {query}")
    return await _search(query=query, session=session, max_scrape_pages=3)


async def search_rent(query: str, session: httpx.AsyncClient):
    """search properites that are for rent"""
    log.info(f"scraping rent search for: {query}")
    filters = {
        "isForSaleForeclosure": {"value": False},
        "isMultiFamily": {"value": False},
        "isAllHomes": {"value": True},
        "isAuction": {"value": False},
        "isNewConstruction": {"value": False},
        "isForRent": {"value": True},
        "isLotLand": {"value": False},
        "isManufactured": {"value": False},
        "isForSaleByOwner": {"value": False},
        "isComingSoon": {"value": False},
        "isForSaleByAgent": {"value": False},
    }
    return await _search(
        query=query, session=session, filters=filters, max_scrape_pages=3
    )


BASE_HEADERS = {
    "accept-language": "en-US,en;q=0.9",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "accept-language": "en-US;en;q=0.9",
    "accept-encoding": "gzip, deflate, br",
}

async def run():
    limits = httpx.Limits(max_connections=5)
    async with httpx.AsyncClient(limits=limits, timeout=httpx.Timeout(15.0), headers=BASE_HEADERS) as session:
        data = await search_rent("New Haven, CT", session)
        print(json.dumps(data, indent=2))


if __name__ == "__main__":
    asyncio.run(run())

import json
import random
import asyncio
from typing import List

from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

BASE_CONFIG = {
    # Zillow.com requires Anti Scraping Protection bypass feature:
    "asp": True,
    "country": "US",
}


def create_search_payload(
    query_data: dict, filters: dict = None, page_number: int = None
):
    """create a search payload for Zillow's search API"""
    payload = {
        "searchQueryState": query_data,
        "wants": {"cat1": ["listResults", "mapResults"], "cat2": ["total"]},
        "requestId": random.randint(2, 10),
    }
    if filters:
        query_data["filterState"] = filters
    if page_number:
        payload["searchQueryState"]["pagination"] = {"currentPage": page_number}
    return json.dumps(payload)


async def _search(
    query: str, filters: dict = None, max_scrape_pages: int = None
) -> List[dict]:
    """base search function which is used by sale and rent search functions"""
    search_data = []
    url = f"https://www.zillow.com/homes/{query}_rb/"
    log.info(f"scraping search: {url}")
    # first scrape the search HTML page and find query variables for this search
    html_result = await SCRAPFLY.async_scrape(ScrapeConfig(url, **BASE_CONFIG))
    script_data = json.loads(
        html_result.selector.xpath("//script[@id='__NEXT_DATA__']/text()").get()
    )
    query_data = script_data["props"]["pageProps"]["searchPageState"]["queryState"]

    # then scrape Zillow's backend API for all query results:
    _backend_url = "https://www.zillow.com/async-create-search-page-state"
    api_result = await SCRAPFLY.async_scrape(
        ScrapeConfig(
            _backend_url,
            **BASE_CONFIG,
            headers={"content-type": "application/json"},
            body=create_search_payload(query_data=query_data, filters=filters),
            method="PUT",
        )
    )
    data = json.loads(api_result.content)
    property_data = data["cat1"]["searchResults"]["listResults"]
    search_data.extend(property_data)
    _total_pages = data["cat1"]["searchList"]["totalPages"]

    # if no pagination data, return
    if _total_pages == 1:
        log.success(f"scraped {len(search_data)} properties from search pages")
        return search_data

    # else paginate remaining pages
    if max_scrape_pages and max_scrape_pages < _total_pages:
        _total_pages = max_scrape_pages

    log.info(f"scraping search pagination, {_total_pages} more pages remaining")
    to_scrape = [
        ScrapeConfig(
            _backend_url,
            **BASE_CONFIG,
            headers={"content-type": "application/json"},
            body=create_search_payload(
                query_data=query_data, filters=filters, page_number=page
            ),
            method="PUT",
        )
        for page in range(2, _total_pages + 1)
    ]

    async for result in SCRAPFLY.concurrent_scrape(to_scrape):
        property_data = json.loads(result.content)["cat1"]["searchResults"][
            "listResults"
        ]
        search_data.extend(property_data)

    log.success(f"scraped {len(search_data)} properties from search pages")
    return search_data


# Example usages 👇
async def search_sale(query: str):
    """search properties that are for sale"""
    log.info(f"scraping sale search for: {query}")
    return await _search(query=query, max_scrape_pages=3)


async def search_rent(query: str):
    """search properites that are for rent"""
    log.info(f"scraping rent search for: {query}")
    filters = {
        "isForSaleForeclosure": {"value": False},
        "isMultiFamily": {"value": False},
        "isAllHomes": {"value": True},
        "isAuction": {"value": False},
        "isNewConstruction": {"value": False},
        "isForRent": {"value": True},
        "isLotLand": {"value": False},
        "isManufactured": {"value": False},
        "isForSaleByOwner": {"value": False},
        "isComingSoon": {"value": False},
        "isForSaleByAgent": {"value": False},
    }
    return await _search(query=query, filters=filters, max_scrape_pages=3)


async def run():
    data = await search_rent("New Haven, CT")
    print(json.dumps(data, indent=2))


if __name__ == "__main__":
    asyncio.run(run())

Let's break down the above Zillow scraper code. We use the _search function to request the search page first and parse the HTML for the location details. Then, we use the location details to request the search API, either for sale or rent real estate data.

Executing the above scraping code will extract the following data from Zillow:

Example output

[
  {
    "zpid": "2052939967",
    "rawHomeStatusCd": "ForSale",
    "marketingStatusSimplifiedCd": "For Sale by Agent",
    "imgSrc": "https://photos.zillowstatic.com/fp/2a7635a58d8e19762acc923e6c938551-p_e.jpg",
    "hasImage": true,
    "detailUrl": "/homedetails/755-757-Sutter-St-San-Francisco-CA-94109/2052939967_zpid/",
    "statusType": "FOR_SALE",
    "statusText": "Multi-family home for sale",
    "price": "$7,800,000",
    "priceLabel": "$7.80M",
    "address": "755-757 Sutter St, San Francisco, CA 94109",
    "baths": 0.0,
    "area": 22572,
    "latLong": {
      "latitude": 37.78844,
      "longitude": -122.41274
    },
    "variableData": {
      "type": "TIME_ON_INFO",
      "text": "1 hour ago",
      "data": {
        "isRead": null,
        "isFresh": false
      }
    },
    "hdpData": {
      "homeInfo": {
        "zpid": 2052939967,
        "streetAddress": "755-757 Sutter St",
        "zipcode": "94109",
        "city": "San Francisco",
        "state": "CA",
        "latitude": 37.78844,
        "longitude": -122.41274,
        "price": 7800000.0,
        "bathrooms": 0.0,
        "livingArea": 22572.0,
        "homeType": "MULTI_FAMILY",
        "homeStatus": "FOR_SALE",
        "daysOnZillow": -1,
        "isFeatured": false,
        "shouldHighlight": false,
        "listing_sub_type": {
          "is_FSBA": true
        },
        "isUnmappable": false,
        "isPreforeclosureAuction": false,
        "homeStatusForHDP": "FOR_SALE",
        "priceForHDP": 7800000.0,
        "timeOnZillow": 5508000,
        "isNonOwnerOccupied": true,
        "isPremierBuilder": false,
        "isZillowOwned": false,
        "currency": "USD",
        "country": "USA",
        "lotAreaValue": 5568.0,
        "lotAreaUnit": "sqft",
        "isShowcaseListing": false
      }
    },
    "isUserClaimingOwner": false,
    "isUserConfirmedClaim": false,
    "pgapt": "ForSale",
    "sgapt": "For Sale (Broker)",
    "shouldShowZestimateAsPrice": false,
    "has3DModel": false,
    "hasVideo": false,
    "isHomeRec": false,
    "hasAdditionalAttributions": true,
    "isFeaturedListing": false,
    "isShowcaseListing": false,
    "listingType": "",
    "isFavorite": false,
    "visited": false,
    "info6String": "Dustin Dolby DRE #01963487",
    "brokerName": "Colliers International",
    "timeOnZillow": 5508000
  },
...
]

The search results provided valuable information about each listing, such as the address, geolocation, and metadata. However, in order to obtain all of the relevant listing details, we must scrape each individual property listing page, which can be found in the detailUrl field.

Note the Zillow search is limited to 500 properties per query. Therefore, we have to use smaller geographical zones to scrape real estate data precisely. For this, refer to the Zillow zip code index page.

Our scraping Zillow code can successfully extract Zillow real estate data from property and search pages. However, running the scrape at scale will lead the website to block our HTTP requests. Let's have a look at avoiding Zillow web scraping blocking next!

Bypass Zillow Scraping Blocking

Creating a Zillow data scraper doesn't seem to be complicated. However, scraping blocking will get in our way, such as in CAPTCHAS or IP address blocking. This is where Scrapfly can lend a hand!

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Anti-bot protection bypass - scrape web pages without blocking!
Rotating residential proxies - prevent IP address and geographic blocks.
JavaScript rendering - scrape dynamic web pages through cloud browsers.
Full browser automation - control browsers to scroll, input and click on objects.
Format conversion - scrape as HTML, JSON, Text, or Markdown.
Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.

For example, here's how we can scrape Zillow without getting blocked. All we have to do is replace out HTTP client with the ScrapFly API cleint, enable the asp parameter, and select a proxy country:

# standard web scraping code
import httpx
from parsel import Selector

response = httpx.get("some zillow.com URL")
selector = Selector(response.text)

# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient

# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")

response = scrapfly.scrape(ScrapeConfig(
    url="website URL",
    asp=True, # enable the anti scraping protection to bypass blocking
    country="US", # set the proxy location to a specfic country
    render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))

# use the built in Parsel selector
selector = response.selector
# access the HTML content
html = response.scrape_result['content']

Try for FREE! More on Scrapfly

FAQ

To wrap this guide up, let's take a look at some frequently asked questions about web scraping Zillow real estate data:

Is it legal to scrape Zillow?

Yes. Zillow's data is publicly available; we're not extracting anything personal or private. Scraping Zillow.com at slow, respectful rates would fall under the ethical scraping definition.
That being said, attention should be paid to GDRP compliance in the EU when scraping personal data of non-agent listings (seller's name, phone number etc). For more, see our Is Web Scraping Legal? article.

Are there public APIs for Zillow?

Yes, but it's extremely limited and not suitable for dataset collection, and there are no Zillow APIs available for public use. Instead, we can scrape Zillow data with Python and httpx.

How to crawl Zillow?

We can easily create a Zillow crawler with the subjects we've covered in this tutorial. Instead of searching for properties explicitly, we can crawl Zillow properties from seed links (any Zillow URLs) and follow the related properties mentioned in a loop. For more on crawling, see How to Crawl the Web with Python.

Are there alternatives for Zillow?

Yes, Redfin is another popular real estate marketplace in the United States. We have covered scraping Redfin in a previous guide. For more guides on real estate target websites, refer to our #realestate blog tag.

Latest Zillow.com Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

Zillow Scraping Summary

In this guide, we explained scraping real estate data from Zillow.

We searched for real estate properties for sale or rent in any region. We used hidden web data scraping by extracting Zillow's state cache from the HTML page to scrape the property data, such as price and building information, contact details, etc.

For this, we used Python with httpx and parsel packages, and to avoid Zillow scraper blocking, we used ScrapFly's API that smartly configures every web scraper connection to avoid being blocked.

How to Scrape Zillow Real Estate Property Data in Python

Latest Zillow.com Scraper Code

Why Scrape Zillow?

Project Setup

Web Scraping with Python

How to Scrape Zillow Property Pages?

How to Scrape Hidden Web Data

How to Find Zillow Properties

How to Scrape Zillow Search Pages?

Bypass Zillow Scraping Blocking

FAQ

Is it legal to scrape Zillow?

Are there public APIs for Zillow?

How to crawl Zillow?

Are there alternatives for Zillow?

Zillow Scraping Summary

Explore this Article with AI

Related Knowledgebase

Python httpx vs requests vs aiohttp - key differences

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

What Python libraries support HTTP2?

How to handle popup dialogs in Playwright?

How to use proxies with Python httpx?

How to scrape images from a website?

How to select dictionary key recursively in Python?

How to use cURL in Python?

How to fix Python requests SSLError?

Selenium: geckodriver executable needs to be in PATH?

Selenium: chromedriver executable needs to be in PATH?

How to fix Python requests ReadTimeout error?

Related Articles

How to Scrape Domain.com.au Real Estate Property Data

How to Scrape Realestate.com.au Property Listing Data

How to Scrape Immowelt.de Real Estate Data

How to Scrape Homegate.ch Real Estate Property Data

How to Scrape Immoscout24.ch Real Estate Property Data

How to Scrape RightMove Real Estate Property Data