Real estate data is one of the most popular web scraping targets. The web is full of this type of information and in this article, we'll take a look at how to web scrape real estate data using Python for free!
We'll start with a quick overview, use cases and what sort of data can we scrape in this niche. Then, we'll take a look at what are the most popular web scraping targets for real estate property data and how to scrape them. Let's dive in!
Why Scrape Real Estate Data?
The web is full of public housing property data listed by people or agencies. This is the biggest most complete market dataset out there which is vital for business analytics and market analysis.
Want to know what are current trends in New York real estate market? Scraping all New York properties with a little bit of data analytics will get you there!
Using these vast public datasets we can follow housing market trends very closely. What sort of houses are in demand? Which areas are becoming more popular? We can even track the performance of competing agencies as their performance is visible in this data.
This information can even be used in more niche scenarios like architectural trends observation or regulation enforcement as all property listings include detailed data points like floor plans, annotated images and exact specifications.
By scraping property data ourselves using Python we don't need to pay for expensive real estate data API which are expensive and offer incomplete and stale data compared to the live web pages.
What Kind of Property Data is out There?
The public data available varies by source (be it Zillow, Redfin, Realtor.com etc.) though we can overview common and unique data points:
- Price Data (current and historical)
- Architectural Details and Features
- Photos
- Architectural Plans
- Listing Performance
- Listing Ratings and Scores
- Tax Records
- Geographical Data - location, address, latitude, longtitude
- Seller information - phone numbers, names, meta information
It doesn't take a lot of imagination to take advantage of these data points! With persistent tracking, we can also overview how the listing changes through time.
What are Some Popular Web Scraping Targets?
There are many public real estate data sources. Let's take a quick look at the most popular ones and how to scrape them.
1. Zillow.com
Zillow is by far the biggest real property listing source in the United States and it's surprisingly easy to scrape. Zillow also offers unique features like "Zestimate" which estimates property prices in the current and historical markets as well their own property and neighborhood ratings.
Zillow also offers pricing history and engagement statistics like how many times the listing has been viewed or saved. All of this data is publicly available and can be easily scraped using Python.
How to Scrape Zillow Real Estate Property Data in Python
Tutorial on how to scrape Zillow.com sale and rent property data, using Python and how to avoid blocking to scrape at scale.
2. Realtor.com
Realtor.com is the second biggest real property listing source in the United States. It offers a very similar dataset to Zillow offering similar premium data points like price history as well as property and neighborhood ratings.
When it comes to web scraping, Realtor.com is very similar to Zillow.com (both websites use the same web technologies) making it another easy scrape source in Python.
How to Scrape Realtor.com - Real Estate Property Data
In this scrape guide we'll be taking a look at real estate property scraping from Realtor.com. We'll also build a tracker scraper that checks for new listings or price changes.
3. Redfin
Redfin.com is another big real property listing source in US. Just like Zillow and Realtor, Redfin contains a very similar dataset that not only includes property data but region meta information, agent contact details as well as popularity metadata (e.g. view and save counts).
How to Scrape Redfin Real Estate Property Data in Python
Tutorial on how to scrape Redfin.com sale and rent property data, using Python and how to avoid blocking to scrape at scale.
4. Idealista
Idealista is the biggest real property listing source in South Europe, primarily most popular in Spain though also available in Italy and Portugal.
The available data points in the European markets are a bit smaller compared to Zillow and Realtor though Idealista still contains unique details like detailed floor plans.
Web scraping Idealista in Python is not any more difficult than other sources either.
How to Scrape Idealista.com
In this scrape guide we'll be taking a look at Idealista.com - biggest real estate website in Spain, Portugal and Italy.
5. RightMove
RightMove is the biggest real property listing source in the UK. It offers a very similar dataset to Zillow and Realtor.com and is easy to scrape using hidden web data approach.
How to Scrape RightMove Real Estate Property Data
In this scrape guide we'll be taking a look at scraping RightMove.co.uk - one of the most popular real estate listing websites in the United Kingdom. We'll be scraping hidden web data and backend APIs directly using Python.
Real Estate Property Platforms by Country
While the US market is owned by a few big players like Zillow and Realtor the rest of the world markets are much more diverse. Here's a list of popular real estate data scrape targets by country:
Country | Sources |
---|---|
Europe | |
๐ง๐พ Belarus | realt.by |
๐ง๐ช Belgium | immoweb.be |
๐จ๐ฟ Czech Republic | sreality.cz |
๐ฉ๐ฐ Denmark | boligsiden.dk |
๐ช๐ช Estonia | kv.ee |
๐ซ๐ฎ Finland | etuovi.com |
๐ซ๐ท France | seloger.com |
๐ฉ๐ช Germany | ImmobilienScout24.de |
๐ฎ๐ธ Iceland | visir.is |
๐ฎ๐ช Ireland | daft.ie |
๐ฎ๐น Italy | idealista.com, immobiliare.it |
๐ณ๐ฑ Netherlands | funda.nl |
๐ณ๐ด Norway | finn.no |
๐ต๐น Portugal | idealista.com |
๐ช๐ธ Spain | idealista.com |
๐ธ๐ช Sweden | hemnet.se |
๐จ๐ญ Switzerland | homegate.ch |
๐ฌ๐ง United Kingdom | rightmove.co.uk |
๐ฆ๐น Austria | Immobilienscout24.at, immowelt.at |
๐ง๐ฌ Bulgaria | imot.bg |
๐ญ๐ท Croatia | oglasnik.hr |
๐ฌ๐ท Greece | spitogatos.gr |
๐ญ๐บ Hungary | ingatlan.com |
๐ฑ๐ป Latvia | city24.lv |
๐ฑ๐น Lithuania | aruodas.lt |
๐ต๐ฑ Poland | otodom.pl |
๐ท๐ด Romania | storia.ro, imobiliare.ro |
๐ท๐บ Russia | cian.ru, domclick.ru |
๐ท๐ธ Serbia | 4zida.rs |
๐ธ๐ฐ Slovakia | nehnutelnosti.sk |
๐ธ๐ฎ Slovenia | nepremicnine.net |
๐บ๐ฆ Ukraine | dom.ria.com |
Others | |
๐ฆ๐ฒ Armenia | estate.am |
๐ฆ๐บ Australia | realestate.com.au |
๐ฆ๐ฟ Azerbaijan | bina.az |
๐ง๐ญ Bahrain | propertyfinder.com.bh |
๐ฐ๐ญ Cambodia | realestate.com.kh |
๐จ๐ณ China | anjuke.com, fang.com, lianjia.com |
๐ฌ๐ช Georgia | myhome.get |
๐ฎ๐ฉ Indonesia | 99.co/id, rumah.com |
๐ฎ๐ท Iran | kilid.com, 2nabsh.com |
๐ฎ๐ถ Iraq | iq.opensooq.com |
๐ฎ๐ฑ Israel | madlan.co.il |
๐ฏ๐ต Japan | suumo.jp |
๐ฏ๐ด Jordan | bayut.jo |
๐ฐ๐ผ Kuwait | kw.opensooq.com |
๐ฑ๐ฆ Laos | banlao.la |
๐ฑ๐ง Lebanon | propertyfinder.com.lb |
๐ฒ๐พ Malaysia | iproperty.com.my, propertyguru.com.my |
๐ณ๐ฟ New Zealand | realestate.co.nz |
๐ด๐ฒ Oman | mawa.om |
๐ต๐ญ Philippines | lamudi.com.ph, dotproperty.com.ph |
๐ถ๐ฆ Qatar | propertyfinder.com.qa |
๐ธ๐ฆ Saudi Arabia | sa.aqar.fm |
๐ธ๐ฌ Singapore | propertyguru.com.sg, 99.co |
๐ฐ๐ท South Korea | land.naver.com |
๐น๐ผ Taiwan | 591.com.tw |
๐น๐ญ Thailand | ddproperty.com |
๐น๐ท Turkey | emlakjet.com, hepsiemlak.com |
๐ฆ๐ช UAE | bayut.com, propertyfinder.com.ae |
๐ป๐ณ Vietnam | batdongsan.com.vn, alonhadat.com.vn |
๐พ๐ช Yemen | ye.opensooq.com |
๐ค scrape this table?
To scrape tables like this we can use Python and XPath selectors:
# For this example we'll be using 2 community packages:
# pip install httpx parsel
import httpx
from parsel import Selector
response = httpx.get("https://scrapfly.io/blog/how-to-scrape-real-estate-property-data-using-python/")
selector = Selector(text=response.text)
results = {}
table = selector.xpath('//h3[contains(@id,"by-country")]/following-sibling::table[1]')
for row in table.xpath('tbody/tr'):
country = row.xpath('td[1]/text()').get()
urls = row.xpath('td[2]//text()').get("").split(",")
if urls: # skip separator rows
results[country] = urls
print(results)
All of these real estate property websites can be scraped using Python and a few popular community libraries.
Web Scraping with Python
Introduction tutorial to web scraping with Python. How to collect and parse public data. Challenges, best practices and an example project.
Real Estate Scraping Tips
The number one tip for scraping real estate property websites is to look out for hidden web data. Many of real estate platforms are powered by Javascript front-ends such as Nextjs which often store whole dataset hidden away in HTML. For more see:
How to Scrape Hidden Web Data
The visible HTML doesn't always represent the whole dataset available on the page. In this article, we'll be taking a look at scraping of hidden web data. What is it and how can we scrape it using Python?
Another tip - for finding all properties for a conclusive dataset try checking /robots.txt
location for a sitemap. Since real estate web pages want to be indexed by crawlers they often contain detailed sitemaps with all of the property links or even split into categories by location or features.
Real Estate Scraping Challenges
By far the biggest challenge when it comes to scraping real estate data is scraper blocking. Some property listing websites only allow connections from specific countries and some use anti web scraping technologies to block scrapers.
To scrape these sources ScrapFly web scraping API can be used which retrieves public web pages for you.
ScrapFly offers several powerful features that help to scrape hard to reach web pages:
- Anti Scraping Protection Bypass
- 190M Pool of Residential Proxies
- Javascript Rendering - use cloud browsers to scrape pages, click buttons and input text.
Scrapfly comes with a convenient Python SDK python package that implements all of these features in a Python client.
FAQ
To wrap this article up let's take a look at some common questions about scraping in real estate:
Is it illegal to scrape real estate listings?
No, scraping public data is perfectly legal. Scraping real estate property data at respectful rates is legal and ethical. That being said, extra attention should be paid when scraping personal details like seller names and phone numbers in the EU (see GDPR). For more, see our Is Web Scraping Legal? article.
My scraper can't find data that is visible on the page - why?
Many real estate property websites use dynamic javascript content in their pages which cannot be understood by web scrapers. To scrape this hidden web data scraping can be used or scraping using web browsers can render all dynamic content as seen by web browsers.
Real Estate Scraping Summary
In this quick introduction, we've taken a look at real estate web scraping. We noted how important hidden web data parsing is in this scraping area and covered the most popular property websites like Zillow, Realtor.com, Idealista and dozens more.
Note that web scraping real estate data is perfectly legal and easily achievable using just Python though if you'd like to scale up check out ScrapFly's Python SDK for free!