Homegate.ch is one of the most popular websites for real estate ads in Switzerland, which includes thousands of various property listings.
In this article, we'll explore how to scrape homegate.ch search and property pages. We'll explore how to avoid homegate.ch web scraping blocking. Let's dig in!
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:
Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.
Scrapfly does not offer legal advice but these are good general rules to follow in web scraping
and for more you should consult a lawyer.
Why Scrape Homegate.ch?
Homegate.com offers access to a comprehensive overview of the real estate market in Switzerland, including exploring different property types, price trends, and geographical variations.
Manually exploring these property listings can be time-consuming. Web scraping homegate.com automates this process, allowing for retrieving data quickly and reliably.
Scraping homegate.com can also help investors and buyers with market research and analysis. Where they can identify market trends and evaluate property values, allowing for better decision-making.
Project Setup
In this guide about homegate.com web scraping, we'll use a few Python libraries:
httpx: HTTP client used for sending requests.
parsel: HTML parsing library for selecting elements using XPath and CSS selectors.
scrapfly-sdk: A Python SDK for a web scraping API that allows for scraping at scale without blocking.
Note that asyncio is already pre-installed in Python. Install the other libraries using the following pip command:
pip install httpx parsel scrapfly-sdk
How to Scrape Homegate.ch Property Pages?
Let's begin by scraping homegate.ch pages. Go to any property listing page and you will get a page similar to this:
Instead of selecting each data point from the HTML using selectors, we will extract all the data directly from script tags in JSON. This data is the same on the HTML but before getting rendered, which is often known as hidden web data.
To view this data on the property page, click the F12 key to open developer tools and scroll down to the script tag that looks like the following HTML:
We can see all the property data in this script as JSON dataset. Let's select and parse it within our scraper:
Python
ScrapFly
import asyncio
import json
from typing import List, Dict
from httpx import AsyncClient, Response
from parsel import Selector
client = AsyncClient(
headers={
# use same headers as a popular web browser (Chrome on Windows in this case)
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Language": "en-US,en;q=0.9",
}
)
def parse_next_data(response: Response) -> Dict:
"""parse listing data from homegate search"""
selector = Selector(response.text)
# extract data in JSON from script tags
next_data = selector.xpath("//script[contains(text(),'window.__INITIAL_STATE__')]/text()").get()
if not next_data:
return
# remove the non-json data and load the data into a JSON object
next_data_json = json.loads(next_data.strip("window.__INITIAL_STATE__="))
return next_data_json
async def scrape_properties(urls: List[str]) -> List[Dict]:
"""scrape listing data from homegate proeprty pages"""
# add the property pages in a scraping list
to_scrape = [client.get(url) for url in urls]
properties = []
# scrape all property pages concurrently
for response in asyncio.as_completed(to_scrape):
data = parse_next_data(await response)
# handle expired property pages
try:
properties.append(data["listing"]["listing"])
except:
print("expired propery page")
pass
return properties
import asyncio
import json
from typing import List, Dict
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
scrapfly = ScrapflyClient(key="Your ScrapFly API key")
def parse_next_data(response: ScrapeApiResponse) -> Dict:
"""parse data from script tags"""
selector = response.selector
# extract data in JSON from script tags
next_data = selector.xpath("//script[contains(text(),'window.__INITIAL_STATE__')]/text()").get()
if not next_data:
return
next_data_json = json.loads(next_data.strip("window.__INITIAL_STATE__="))
return next_data_json
async def scrape_properties(urls: List[str]) -> List[Dict]:
"""scrape listing data from homegate proeprty pages"""
# add the property pages in a scraping list
to_scrape = [ScrapeConfig(url, asp=True, country="CH") for url in urls]
properties = []
# scrape all property pages concurrently
async for response in scrapfly.concurrent_scrape(to_scrape):
data = parse_next_data(response)
# handle expired property pages
try:
properties.append(data["listing"]["listing"])
except:
print("expired propery page")
pass
return properties
Now that our code can scrape homagate.ch property pages, let's scrape search pages to discover the desired property listings.
How to Scrape Homegate.ch Search Pages?
In this section, we'll create a homegate.ch scraper to scrape search pages of any search query. We'll also integrate pagination support. The pagination is controlled through the ep url parameter, so the first page for properties in Bern, Switzerland looks like this:
As for the data itself, just like in property pages, the search page data can be found in a script tag as well as a JSON dataset:
To scrape search pages, we'll use a code similar to the homegate.ch scraper we wrote earlier:
Python
ScrapFly
import asyncio
import json
from typing import List, Dict, Literal
from httpx import AsyncClient, Response
from parsel import Selector
client = AsyncClient(
headers={
# use same headers as a popular web browser (Chrome on Windows in this case)
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Language": "en-US,en;q=0.9",
}
)
def parse_next_data(response: Response) -> Dict:
"""parse listing data from homegate search"""
selector = Selector(response.text)
# extract data in JSON from script tags
next_data = selector.xpath("//script[contains(text(),'window.__INITIAL_STATE__')]/text()").get()
if not next_data:
return
next_data_json = json.loads(next_data.strip("window.__INITIAL_STATE__="))
return next_data_json
async def scrape_search(query_type: Literal["rent", "buy"] = "rent") -> List[Dict]:
"""scrape listing data from homegate search pages"""
# change the below URL to the desired search but validate it in the browser first
url = f"https://www.homegate.ch/{query_type}/real-estate/city-bern/matching-list"
# scrape the first search page first
first_page = await client.get(url)
data = parse_next_data(first_page)["resultList"]["search"]["fullSearch"]["result"]
search_data = data["listings"]
# get the number of maximum search pages available
max_search_pages = data["pageCount"]
print(f"scraped first search page, remaining ({max_search_pages} search pages)")
# add the remaining search pages in a scraping list
other_pages = [client.get(url=str(first_page.url) + f"?ep={page}") for page in range(2, max_search_pages + 1)]
# scrape the remaining search pages concurrently
for response in asyncio.as_completed(other_pages):
data = parse_next_data(await response)
search_data.extend(data["resultList"]["search"]["fullSearch"]["result"]["listings"])
print(f"scraped {len(search_data)} property listings from search")
return search_data
import asyncio
import json
from typing import List, Dict, Literal
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
scrapfly = ScrapflyClient(key="Your ScrapFly API key")
def parse_next_data(response: ScrapeApiResponse) -> Dict:
"""parse data from script tags"""
selector = response.selector
# extract data in JSON from script tags
next_data = selector.xpath("//script[contains(text(),'window.__INITIAL_STATE__')]/text()").get()
if not next_data:
return
next_data_json = json.loads(next_data.strip("window.__INITIAL_STATE__="))
return next_data_json
async def scrape_search(query_type: Literal["rent", "buy"] = "rent") -> List[Dict]:
"""scrape listing data from homegate search pages"""
# change the below URL to the desired search but validate it in the browser first
url = f"https://www.homegate.ch/{query_type}/real-estate/city-bern/matching-list"
# scrape the first search page first
first_page = await scrapfly.async_scrape(ScrapeConfig(url, asp=True, country="CH"))
data = parse_next_data(first_page)["resultList"]["search"]["fullSearch"]["result"]
search_data = data["listings"]
# get the number of maximum search pages available
max_search_pages = data["pageCount"]
print(f"scraped first search page, remaining ({max_search_pages} search pages)")
# add the remaining search pages in a scraping list
other_pages = [
ScrapeConfig(first_page.context['url']+ f"?ep={page}", asp=True, country="CH")
for page in range(2, max_search_pages + 1)
]
# scrape the remaining search pages concurrently
async for response in scrapfly.concurrent_scrape(other_pages):
data = parse_next_data(response)
search_data.extend(data["resultList"]["search"]["fullSearch"]["result"]["listings"])
return search_data
Here, we use the scrape_search() function to scrape the first search page data by extracting it from the script tag. Then, we extract the total number of search pages available to scrape. Next, we add the remaining search pages to a scraping list and scrape them concurrently for faster scraping.
The result is a list containing all property listings on the search pages, similar to this:
Example output
[
{
"listingType": {
"type": "PREMIUM"
},
"listing": {
"address": {
"geoCoordinates": {
"accuracy": "HIGH",
"manual": true,
"latitude": 46.958851562115,
"longitude": 7.427321501252
},
"locality": "Bern",
"postalCode": "3012",
"street": "Forstweg 71"
},
"categories": [
"APARTMENT",
"ATTIC_FLAT"
],
"characteristics": {
"hasNiceView": true,
"hasBalcony": true,
"hasElevator": true,
"livingSpace": 150,
"numberOfRooms": 5.5,
"floor": 3,
"isQuiet": true,
"yearBuilt": 1972,
"hasGarage": true
},
"id": "4000203103",
"localization": {
"de": {
"urls": [],
"text": {
"title": "6 1/2 Zi Maisonette-Wohnung in Bern",
"description": "Das Objekt liegt im L\u00e4ngasse Quartier (Endstation Bus) mit wunderbarer Sicht auf die Bergen. Die Gallerie mit Chemin\u00e9e oder das grosse Wohnzimmer laden zum verweilen ein. Grosser Balkon sowie Estrich und Keller vorhanden. Wunderbare heimelige originale \"Fonduestube\". Der Bahnhof und \u00d6V-Anbindungen (Bus) wie auch diverse Einkaufsm\u00f6glichkeiten liegen in attraktiver Entfernung. Schulen und Bremgartenwald liegen in unmittelbarer Umgebung (2-3 Min zu Fuss). Einstellhallenplatz auf Wunsch ebenfalls verf\u00fcgbar."
},
"attachments": [
{
"type": "IMAGE",
"url": "https://media2.homegate.ch/listings/v2/hgonif/4000203103/image/035b210b03b055ffc83b819da5b7f165.jpg",
"file": "43e1277268.jpg"
},
{
"type": "IMAGE",
"url": "https://media2.homegate.ch/listings/v2/hgonif/4000203103/image/9fcec0c69d5a40a9bc0bd8c4752aaa15.jpg",
"file": "f3b8f6eb43.jpg"
},
{
"type": "IMAGE",
"url": "https://media2.homegate.ch/listings/v2/hgonif/4000203103/image/3c7e560405524a99035e78113c12061d.jpg",
"file": "3be62e1182.jpg"
},
{
"type": "IMAGE",
"url": "https://media2.homegate.ch/listings/v2/hgonif/4000203103/image/a903b777eaf5971c791d800f7a138bf1.jpg",
"file": "cff94726e1.jpg"
},
{
"type": "IMAGE",
"url": "https://media2.homegate.ch/listings/v2/hgonif/4000203103/image/dd57c261a15084abcb0305c7a0bfde6d.jpg",
"file": "ee01538c86.jpg"
}
]
},
"primary": "de"
},
"meta": {
"createdAt": "2023-10-11T17:39:09.556Z"
},
"offerType": "RENT",
"platforms": [
"homegate",
"alleimmobilien",
"home",
"immostreet"
],
"prices": {
"rent": {
"interval": "WEEK",
"gross": 4240
},
"currency": "CHF",
"buy": {}
},
"valueAddedServices": {}
},
"listingCard": {
"size": "L"
},
"id": "4000203103",
"remoteViewing": false
}
]
We can successfully scrape homegate.ch property and search pages. However, after sending a few requests, our homegate.ch scraper will likely get blocked. Let's take a look at a solution!
How to Avoid Homegate.ch Web Scraping Blocking?
To scale up homegate.ch scraping check out Scrapfly!
For example, here is how we can use the asp feature with the ScrapFly Python SDK to avoid homegate.ch web scraping blocking:
import httpx
from parsel import Selector
response = httpx.get("some homegate.ch url")
selector = Selector(response.text)
# in ScrapFly SDK becomes
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
scrapfly_client = ScrapflyClient("Your ScrapFly API key")
result: ScrapeApiResponse = scrapfly_client.scrape(ScrapeConfig(
# some homegate.ch URL
"https://www.homegate.ch/rent/4000269209",
# we can select specific proxy country
country="CH",
# and enable anti scraping protection bypass:
asp=True,
# allows JavaScript rendering similar to headless browsers
render_js=True
))
# use the built-in parsel selector
selector = result.selector
To wrap this guide on scraping homegate.ch, let's take at some frequently asked questions.
Is it legal to scrape homegate.ch?
Yes, all data on homegate.ch are publicly available, so it's legal to scrape homegate.ch as long as you keep your scraping rate reasonable. However, using scraped personal data (like private real estate agent details) from homegate.ch commercially may violate GDRP requirements in EU countries. For more information, refer to our previous article on web scraping legality.
Is there a public API for homegate.ch?
There is no public API for homegate.ch. However, scraping homegate.ch is straightforward using Python as descirbed in this article. Further, the scrapers can be easily turned into APIs using fastapi and real time scraping.
How to avoid homegate.ch web scraping blocking?
There are a lot of factors that contribute to web scraping blocking, including IP addresses, security handshakes, cookies and headers. To avoid homegate.ch scraping blocking, you need to consider these factors. For more information, refer to our previous guide on avoiding web scraping blocking.
In this article, we wrote a short Homegate scraper using Python. We looked into scraping Homegate properties as well as search pages to discover property datasets.
For that, we focused on hidden web data scraping and extracted property JSON datasets directly from hidden HTML source. Finally, we've taken a look at how to bypass Homegate scraper blocking using ScrapFly.