How to Scrape Redfin Real Estate Property Data in Python

article feature image

In this web scraping tutorial, we'll be taking a look at how to scrape Redfin - a popular real estate listing web page.

We'll be scraping real estate data such as pricing info, addresses, photos and phone numbers displayed on Redfin property pages.

To scrape Redfin properties we'll be using hidden web data scraping method. We'll also take a look at how to find real estate properties using Redfin's search and sitemap system to collect the entire real estate dataset available on the website.

Finally, we'll also cover property tracking by continuously scraping for newly listed or updated - giving us an upper hand in real estate bidding. We'll be using Python with a few community libraries - Let's dive in!

Hands on Python Web Scraping Tutorial and Example Project

If you're new to web scraping with Python we recommend checking out our full introduction tutorial to web scraping with Python and common best practices.

Hands on Python Web Scraping Tutorial and Example Project

Why Scrape Redfin.com?

Redfin.com is one of the biggest real estate websites in the United States making it the biggest public real estate dataset out there. Containing fields like real estate prices, listing locations and sale dates and general property information.

This is valuable information for market analytics, the study of the housing industry, and a general competitor overview. By web scraping Redfin we can easily have access to a major real estate dataset.
See our Scraping Use Cases guide for more.

How to Scrape Real Estate Property Data using Python

For more real estate scrape guides see our hub article which covers scraping of Zillow, Realtor.com, Idealista and other popular platforms.

How to Scrape Real Estate Property Data using Python

Available Data Fields

We can scrape Redfin for several popular real estate datafields and targets:

  • Properties for sale
  • Land for sale
  • Open house events
  • Properties for rent
  • Real estate agent info

In this guide, we'll cover focus on scraping real estate property (rent and sale), though everything we'll learn can be easily applied to other pages.

Setup

In this tutorial, we'll be using Python with two community packages:

  • httpx - HTTP client library which will let us communicate with Redfin.com's servers
  • parsel - HTML parsing library which will help us to parse our web scraped HTML files.
  • jmespath - JSON parsing library. Allows to write XPath like rules for JSON.

These packages can be easily installed via the pip install command:

$ pip install httpx parsel jmespath

Alternatively, feel free to swap httpx out with any other HTTP client package such as requests as we'll only need basic HTTP functions which are almost interchangeable in every library. As for, parsel, another great alternative is the beautifulsoup package.

Scraping Property Data

To start let's take a look at how to scrape property data of a single listing page.

Redfin is using Next.js for rendering its pages. We can take advantage of this fact and scrape the hidden web data instead of parsing the HTML directly. This might appear a bit complex so if you're unfamiliar with hidden web data scraping see our introduction article:

How to Scrape Hidden Web Data

Introduction to scraping hidden web data - what is it and best ways to parse it in Python

How to Scrape Hidden Web Data

Redfin's hidden dataset contains all of the property data and more. In this scenario the property data is located in a javascript variable __reactServerState.InitialContext:

illustration of the page source of a redfin property

if we click view source and scroll to the bottom we can see script element with page cache

To extract the whole dataset we will:

  1. Find the script element which contains this javascript variable
  2. Use regular expressions to find the variable's value
  3. Load it as Python dictionary and clean up the dataset

Let's see it in action:

import json
import asyncio
from httpx import AsyncClient, Response
from parsel import Selector

session = AsyncClient(headers={
    # use same headers as a popular web browser (Chrome on Windows in this case)
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Accept-Language": "en-US,en;q=0.9",
})


def extract_cache(react_initial_context):
    """extract microservice cache data from the react server agent"""
    result = {}
    for name, cache in react_initial_context["ReactServerAgent.cache"]["dataCache"].items():
        # first we retrieve cached response and see whether it's a success
        try:
            cache_response = cache["res"]
        except KeyError:  # empty cache
            continue
        if cache_response.get("status") != 200:
            print("skipping non 200 cache")
            continue
        # then extract cached response body and interpret it as a JSON
        cache_data = cache_response.get("body", {}).get("payload")
        if not cache_data:
            cache_data = json.loads(cache_response["text"].split("&&", 1)[-1]).get("payload")
        if not cache_data:
            # skip empty caches
            continue
        # for Redfin we can cleanup cache names for home data endpoints:
        if "/home/details" in name:
            name = name.split("/home/details/")[-1]
        result[name.replace("/", "")] = cache_data
        # ^note: we sanitize name to avoid slashes as they are not allowed in JMESPath
    return result


def parse_property(response: Response):
    selector = Selector(response.text)
    script = selector.xpath('//script[contains(.,"ServerState.InitialContext")]/text()').get()
    initial_context = re.findall(r"ServerState.InitialContext = (\{.+\});", script)
    if not initial_context:
        print(f"page {response.url} is not a property listing page")
        return
    return extract_cache(json.loads(initial_context[0]))


async def scrape_properties(urls: List[str]) -> List[PropertyResult]:
    to_scrape = [session.get(url) for url in urls]
    properties = []
    for response in asyncio.as_completed(to_scrape):
        properties.append(parse_property(await response))
    return properties
Run Code

To run our scraper all we have to do is call the asyncio coroutine:

urls = [
    "https://www.redfin.com/FL/Cape-Coral/402-SW-28th-St-33914/home/61856041",
    "https://www.redfin.com/FL/Cape-Coral/4202-NW-16th-Ter-33993/home/62053611",
    "https://www.redfin.com/FL/Cape-Coral/1415-NW-38th-Pl-33993/home/62079956",
]

if __name__ == "__main__":
    asyncio.run(scrape_properties(urls))

Above, we used httpx to retrieve the HTML page and load it as a parsel.Selector. Then, we find the script element which contains the javascript cache variable. To extract the cache we use a simple regular expression that captures text between InitialContext keyword and }; character.

This results in a colossal Redfin property data and since this is an internal web dataset it's full of technical datafields - we had to do a bit of cleanup. Let's parse it!

Parsing Data

The dataset we scraped is huge and contains loads of useless information. To parse it down to something we can digest we'll be using JMESPath - a popular JSON parsing syntax.

JMESPath is a bit similar to XPath or CSS selectors but for JSON. Using it, we can create path rules of where to find the data fields we want to keep.

For example, for price we'll be using JMESPath path:

aboveTheFold.addressSectionInfo.priceInfo.amount

Let's take a look at the whole parser:

from typing import TypedDict
import jmespath

class PropertyResult(TypedDict):
    """type hint for property result. i.e. Defines what fields are expected in property dataset"""

    photos: List[str]
    videos: List[str]
    price: int
    info: Dict[str, str]
    amenities: List[Dict[str, str]]
    records: Dict[str, str]
    history: Dict[str, str]
    floorplan: Dict[str, str]
    activity: Dict[str, str]


def parse_redfin_proprety_cache(data_cache) -> PropertyResult:
    """parse Redfin's cache data for proprety information"""
    # here we define field name to JMESPath mapping
    parse_map = {
        # from top area of the page: basic info, videos and photos
        "photos": "aboveTheFold.mediaBrowserInfo.photos[*].photoUrls.fullScreenPhotoUrl",
        "videos": "aboveTheFold.mediaBrowserInfo.videos[*].videoUrl",
        "price": "aboveTheFold.addressSectionInfo.priceInfo.amount",
        "info": """aboveTheFold.addressSectionInfo.{
            bed_num: beds,
            bath_numr: baths,
            full_baths_num: numFullBaths,
            sqFt: sqFt,
            year_built: yearBuitlt,
            city: city,
            state: state,
            zip: zip,
            country_code: countryCode,
            fips: fips,
            apn: apn,
            redfin_age: timeOnRedfin,
            cumulative_days_on_market: cumulativeDaysOnMarket,
            property_type: propertyType,
            listing_type: listingType,
            url: url
        }
        """,
        # from bottom area of the page: amenities, records and event history
        "amenities": """belowTheFold.amenitiesInfo.superGroups[].amenityGroups[].amenityEntries[].{
            name: amenityName, values: amenityValues
        }""",
        "records": "belowTheFold.publicRecordsInfo",
        "history": "belowTheFold.propertyHistoryInfo",
        # other: sometimes there are floorplans
        "floorplan": r"listingfloorplans.floorPlans",
        # and there's always internal Redfin performance info: views, saves, etc.
        "activity": "activityInfo",
    }
    results = {}
    for key, path in parse_map.items():
        value = jmespath.search(path, data_cache)
        results[key] = value
    return results
Example Output
  {
    "photos": [
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_1_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_2_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_3_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_4_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_5_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_6_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_7_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_8_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_9_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_10_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_11_0.jpg",
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_12_0.jpg"
    ],
    "videos": [],
    "price": 311485,
    "info": {
      "bed_num": 3,
      "bath_numr": 2.5,
      "full_baths_num": 2,
      "sqFt": {
        "displayLevel": 1,
        "value": 1636
      },
      "year_built": null,
      "city": "Cape Coral",
      "state": "FL",
      "zip": "33909",
      "country_code": "US",
      "fips": "12071",
      "apn": "304324C2137060780",
      "redfin_age": 489909873,
      "cumulative_days_on_market": 6,
      "property_type": 13,
      "listing_type": 1,
      "url": "/FL/Cape-Coral/1445-Weeping-Willow-Ct-33909/home/178539241"
    },
    "amenities": [
      {
        "name": "Parking",
        "values": [
          "2+ Spaces",
          "Driveway Paved"
        ]
      },
      {
        "name": "Amenities",
        "values": [
          "Basketball",
          "Business Center",
          "Clubhouse",
          "Community Pool",
          "Community Room",
          "Community Spa/Hot tub",
          "Exercise Room",
          "Pickleball",
          "Play Area",
          "Sidewalk",
          "Tennis Court",
          "Underground Utility",
          "Volleyball"
        ]
      },
      ... // trucated for blog
    ],
    "records": {
      "basicInfo": {
        "propertyTypeName": "Townhouse",
        "lotSqFt": 1965,
        "apn": "304324C2137060780",
        "propertyLastUpdatedDate": 1669759070845,
        "displayTimeZone": "US/Eastern"
      },
      "taxInfo": {},
      "allTaxInfo": [],
      "addressInfo": {
        "isFMLS": false,
        "street": "1445 Weeping Willow Ct",
        "city": "Cape Coral",
        "state": "FL",
        "zip": "33909",
        "countryCode": "US"
      },
      "mortgageCalculatorInfo": {
        "displayLevel": 1,
        "dataSourceId": 192,
        "listingPrice": 311485,
        "downPaymentPercentage": 20.0,
        "monthlyHoaDues": 312,
        "propertyTaxRate": 1.29,
        "homeInsuranceRate": 1.17,
        "mortgageInsuranceRate": 0.75,
        "creditScore": 740,
        "loanType": 1,
        "mortgageRateInfo": {
          "fifteenYearFixed": 5.725,
          "fiveOneArm": 5.964,
          "thirtyYearFixed": 6.437,
          "isFromBankrate": true
        },
        "countyId": 471,
        "stateId": 19,
        "countyName": "Lee County",
        "stateName": "Florida",
        "mortgageRatesPageLinkText": "View all rates",
        "baseMortgageRatesPageURL": "/mortgage-rates?location=33909&locationType=4&locationId=14465",
        "zipCode": "33909",
        "isCoop": false
      },
      "countyUrl": "/county/471/FL/Lee-County",
      "countyName": "Lee County",
      "countyIsActive": true,
      "sectionPreviewText": "County data refreshed on 11/29/2022"
    },
    "history": {
      "isHistoryStillGrowing": true,
      "hasAdminContent": false,
      "hasLoginContent": false,
      "dataSourceId": 192,
      "canSeeListing": true,
      "listingIsNull": false,
      "hasPropertyHistory": true,
      "showLogoInLists": false,
      "definitions": [],
      "displayTimeZone": "US/Eastern",
      "isAdminOnlyView": false,
      "events": [
        {
          "isEventAdminOnly": false,
          "price": 311485,
          "isPriceAdminOnly": false,
          "eventDescription": "Listed",
          "mlsDescription": "Active",
          "source": "BEARMLS",
          "sourceId": "222084966",
          "dataSourceDisplay": {
            "dataSourceId": 192,
            "dataSourceDescription": "Bonita Springs Association of Realtors (BEARMLS)",
            "dataSourceName": "BEARMLS",
            "shouldShowLargerLogo": false
          },
          "priceDisplayLevel": 1,
          "historyEventType": 1,
          "eventDate": 1669708800000
        }
      ],
      "mediaBrowserInfoBySourceId": {},
      "addressInfo": {
        "isFMLS": false,
        "street": "1445 Weeping Willow Ct",
        "city": "Cape Coral",
        "state": "FL",
        "zip": "33909",
        "countryCode": "US"
      },
      "isFMLS": false,
      "historyHasHiddenRows": false,
      "priceEstimates": {
        "displayLevel": 1,
        "priceHomeUrl": "/what-is-my-home-worth?estPropertyId=178539241&src=ldp-estimates"
      },
      "sectionPreviewText": "Details will be added when we have them"
    },
    "floorplan": [
      "https://ssl.cdn-redfin.com/photo/192/bigphoto/966/222084966_1_0.jpg",
    ],
    "activity": {
      "viewCount": 28,
      "favoritesCount": 1,
      "totalFavoritesCount": 1,
      "xOutCount": 0,
      "totalXOutCount": 0,
      "tourCount": 0,
      "totalTourCount": 0,
      "addressInfo": {
        "isFMLS": false,
        "street": "1445 Weeping Willow Ct",
        "city": "Cape Coral",
        "state": "FL",
        "zip": "33909",
        "countryCode": "US"
      },
      "sectionPreviewText": "1 people favorited this home"
    }
  }

We've reduced thousands of lines long Redfin property dataset to just a few dozen most important fields using JMESPath and Python.

We can see how easy it is to scrape modern websites with modern scraping tools - to scrape a Redfin property we used only a few lines of Python code. Next, let's take a look at how can we find listings to scrape.

Finding Properties

There are several ways to find listings on Redfin for scraping. Though the most obvious and fastest way is to use Redfin's sitemaps.

Redfin offers an extensive sitemap system that contains sitemaps for listings by US state, neighborhood, school district and so on. For that let's take a look at the /robots.txt page, specifically the sitemap section.

For example, there are sitemaps for all location directories:

And for rental data:

Finally, we have sitemaps for non-listing objects such Agents.

Following Changes

To keep track of new Redfin listings we can use sitemap feeds for the newest and updated listings:

  • newest which signals when new listings are being posted.
  • latest which siganls when listings are being updated or changed.

To find new listings and updates we'll be scraping these two sitemaps which provide a listing URL and timestamp when it was listed or updated:

<url>
  <loc>https://www.redfin.com/NH/Boscawen/1-Sherman-Dr-03303/home/96531826</loc>
  <lastmod>2022-12-01T00:53:20.426-08:00</lastmod>
  <changefreq>daily</changefreq>
  <priority>1.0</priority>
</url>

⌚ Note that this sitemap is using UTC-8 timezone. It's indicated by the last number of the datetime string: -08.00.

To scrape these Redfin feeds in Python we'll be using httpx and parsel libraries we've used before:

import arrow  # for handling datetime: pip install arrow
from httpx import AsyncClient

session = AsyncClient(headers={
    # use same headers as a popular web browser (Chrome on Windows in this case)
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Accept-Language": "en-US,en;q=0.9",
})


async def scrape_feed(url) -> Dict[str, datetime]:
    """scrape Redfin sitemap and return url:datetime dictionary"""
    result = await session.get(url)
    selector = Selector(text=result.text, type="xml")
    results = {}
    for item in selector.xpath("//url"):
        url = item.xpath("loc/text()").get()
        pub_date = item.xpath("lastmod/text()").get(%Y-%m-%dT%H:%M:%S.%f%z)
        results[url] = arrow.get(pub_date).datetime
    return results

We can then use the Python Redfin scraper we wrote earlier to scrape these URLs for property datasets.

Avoiding Blocking with ScrapFly

Scraping Redfin.com seems very straight-forward though, when scraping at scale our scrapers are very likely to be blocked or asked to solve captchas.

Redfin.com can block web scrapers: 'our usage analysis algorithms think that you might be a robot'

To get around this, let's take advantage of ScrapFly API which can avoid all of these blocks for us!
ScrapFly offers several powerful features that'll help us to get around Redfin's web scraper blocking:

For this, we'll be using the scrapfly-sdk python package and the Anti Scraping Protection Bypass feature. First, let's install scrapfly-sdk using pip:

$ pip install scrapfly-sdk

To take advantage of ScrapFly's API in our Redfin.com web scraper all we need to do is change our httpx session code with scrapfly-sdk client requests:

import httpx

response = httpx.get("some redfin.com url")
# in ScrapFly SDK becomes
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient("YOUR SCRAPFLY KEY")
result = client.scrape(ScrapeConfig(
    "some Redfin.ocm url",
    # we can select specific proxy country
    country="US",
    # and enable anti scraping protection bypass:
    asp=True
))

For more on how to scrape Redfin.com using ScrapFly, see the Full Scraper Code section.

FAQ

To wrap this guide up, let's take a look at some frequently asked questions about web scraping Redfin data:

Yes. Redfin.com's data is available publically; we're not collecting anything private. Scraping Redfin at slow, respectful rates would fall under the ethical scraping definition.
That being said, attention should be paid to GDRP compliance in the EU when storing personal data such as seller's name, phone number etc. For more, see our Is Web Scraping Legal? article.

Does Redfin.com have an API?

No, there's no Redfin API for real estate data though, Redfin does publish market summary datasets in their data-center section. For detailed property data we can scrape Redfin data using Python.

How to crawl Redfin.com?

Like scraping we can also crawl redfin.com by following related rental pages listed on every property page. To write a Redfin crawler see the related properties field in datasets scraped in this tutorial.

Summary

In this tutorial, we built a Redfin scraper in Python with a few free community packages. We started by taking a look at how to scrape a single property page by extracting hidden web cache data.

To parse property data we used JMESPath JSON parsing language to write a few simple rules which reduced scraped dataset to vital property data fields.

Finally, to find property listings and track new/updated ones we explored Redfin's sitemap system.

For this Redfin data scraper we used Python with httpx, parsel and jmespath packages. To avoid being blocked we used ScrapFly's API which smartly configures every web scraper connection to avoid being blocked.
For more about ScrapFly, see our documentation and try it out for FREE!

Full Scraper Code

Here's the full Redfin web scraper code with ScrapFly integration:

💙 This code should only be used as a reference. To scrape data from Redfin at scale you'll need some error handling, logging and retrying logic

import asyncio
import json
import re
from datetime import datetime
from pathlib import Path
from typing import Dict, List

import arrow
import jmespath
from scrapfly import ScrapeApiResponse, ScrapeConfig, ScrapflyClient
from typing_extensions import TypedDict

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY", max_concurrency=2)

def extract_cache(react_initial_context):
    """extract microservice cache data from the react server agent"""
    result = {}
    for name, cache in react_initial_context["ReactServerAgent.cache"]["dataCache"].items():
        # first we retrieve cached response and see whether it's a success
        try:
            cache_response = cache["res"]
        except KeyError:  # empty cache
            continue
        if cache_response.get("status") != 200:
            print("skipping non 200 cache")
            continue
        # then extract cached response body and interpret it as a JSON
        cache_data = cache_response.get("body", {}).get("payload")
        if not cache_data:
            cache_data = json.loads(cache_response["text"].split("&&", 1)[-1]).get("payload")
        if not cache_data:
            # skip empty caches
            continue
        # for Redfin we can cleanup cache names for home data endpoints:
        if "/home/details" in name:
            name = name.split("/home/details/")[-1]
        result[name.replace("/", "")] = cache_data
        # ^note: we sanitize name to avoid slashes as they are not allowed in JMESPath
    return result


class PropertyResult(TypedDict):
    """type hint for property result. i.e. Defines what fields are expected in property dataset"""

    photos: List[str]
    videos: List[str]
    price: int
    info: Dict[str, str]
    amenities: List[Dict[str, str]]
    records: Dict[str, str]
    history: Dict[str, str]
    floorplan: Dict[str, str]
    activity: Dict[str, str]


def parse_redfin_proprety_cache(data_cache) -> PropertyResult:
    """parse Redfin's cache data for proprety information"""
    # here we define field name to JMESPath mapping
    parse_map = {
        # from top area of the page: basic info, videos and photos
        "photos": "aboveTheFold.mediaBrowserInfo.photos[*].photoUrls.fullScreenPhotoUrl",
        "videos": "aboveTheFold.mediaBrowserInfo.videos[*].videoUrl",
        "price": "aboveTheFold.addressSectionInfo.priceInfo.amount",
        "info": """aboveTheFold.addressSectionInfo.{
            bed_num: beds,
            bath_numr: baths,
            full_baths_num: numFullBaths,
            sqFt: sqFt,
            year_built: yearBuitlt,
            city: city,
            state: state,
            zip: zip,
            country_code: countryCode,
            fips: fips,
            apn: apn,
            redfin_age: timeOnRedfin,
            cumulative_days_on_market: cumulativeDaysOnMarket,
            property_type: propertyType,
            listing_type: listingType,
            url: url
        }
        """,
        # from bottom area of the page: amenities, records and event history
        "amenities": """belowTheFold.amenitiesInfo.superGroups[].amenityGroups[].amenityEntries[].{
            name: amenityName, values: amenityValues
        }""",
        "records": "belowTheFold.publicRecordsInfo",
        "history": "belowTheFold.propertyHistoryInfo",
        # other: sometimes there are floorplans
        "floorplan": r"listingfloorplans.floorPlans",
        # and there's always internal Redfin performance info: views, saves, etc.
        "activity": "activityInfo",
    }
    results = {}
    for key, path in parse_map.items():
        value = jmespath.search(path, data_cache)
        results[key] = value
    return results


def parse_property(result: ScrapeApiResponse) -> PropertyResult:
    script = result.selector.xpath('//script[contains(.,"ServerState.InitialContext")]/text()').get()
    initial_context = re.findall(r"ServerState.InitialContext = (\{.+\});", script)
    if not initial_context:
        print(f"page {result.context['url']} is not a property listing page")
        return
    return parse_redfin_proprety_cache(extract_cache(json.loads(initial_context[0])))


async def scrape_properties(urls: List[str]) -> List[PropertyResult]:
    to_scrape = [ScrapeConfig(url=url, asp=True, country="US", cache=True) for url in urls]
    properties = []
    async for result in scrapfly.concurrent_scrape(to_scrape):
        properties.append(parse_property(result))
    return properties



async def scrape_feed(url) -> Dict[str, datetime]:
    """Scrape Redfin sitemap for URLs"""
    result = await scrapfly.async_scrape(ScrapeConfig(url=url, country="US", cache=True, asp=True))
    results = {}
    for item in result.selector.xpath("//url"):
        url = item.xpath("loc/text()").get()
        pub_date = item.xpath("lastmod/text()").get()
        results[url] = arrow.get(pub_date).datetime
    return results

async def example_run():
  urls = [
      "https://www.redfin.com/FL/Cape-Coral/402-SW-28th-St-33914/home/61856041",
      "https://www.redfin.com/FL/Cape-Coral/4202-NW-16th-Ter-33993/home/62053611",
      "https://www.redfin.com/FL/Cape-Coral/1415-NW-38th-Pl-33993/home/62079956",
      "https://www.redfin.com/FL/Cape-Coral/1026-NE-34th-Ln-33909/home/67830364",
      "https://www.redfin.com/FL/Cape-Coral/1022-NE-34th-Ln-33909/home/62069246",
      "https://www.redfin.com/FL/Cape-Coral/4132-NE-21st-Ave-33909/home/67818227",
      "https://www.redfin.com/FL/Cape-Coral/2115-NW-8th-Ter-33993/home/62069405",
      "https://www.redfin.com/FL/Cape-Coral/1451-Weeping-Willow-Ct-33909/home/178539244",
      "https://www.redfin.com/FL/Cape-Coral/1449-Weeping-Willow-Ct-33909/home/178539243",
      "https://www.redfin.com/FL/Cape-Coral/5431-SW-6th-Ave-33914/home/61888403",
      "https://www.redfin.com/FL/Cape-Coral/1445-Weeping-Willow-Ct-33909/home/178539241",
  ]
    feed = await scrape_feed("https://www.redfin.com/stingray/api/gis-cms/city-sitemap/CA/San-Francisco?channel=buy")
    asyncio.run(scrape_feed("https://www.redfin.com/newest_listings.xml"))


if __name__ == "__main__":
    asyncio.run(example_run())

Related Posts

How to Scrape RightMove Real Estate Property Data with Python

In this scrape guide we'll be taking a look at scraping RightMove.co.uk - one of the most popular real estate listing websites in the United Kingdom. We'll be scraping hidden web data and backend APIs directly using Python.

How to Scrape Google Search with Python

In this scrape guide we'll be taking a look at how to scrape Google Search - the biggest index of public web. We'll cover dynamic HTML parsing and SERP collection itself.

How to Scrape Ebay using Python

In this scrape guide we'll be taking a look at Ebay.com - the biggest peer-to-peer e-commerce portal in the world. We'll be scraping product details and product search.