     [Blog](https://scrapfly.io/blog)   /  [beautifulsoup](https://scrapfly.io/blog/tag/beautifulsoup)   /  [How to Scrape Imovelweb.com](https://scrapfly.io/blog/posts/how-to-scrape-imovelweb)   # How to Scrape Imovelweb.com

 by [Ziad Shamndy](https://scrapfly.io/blog/author/ziad) Apr 18, 2026 13 min read [\#beautifulsoup](https://scrapfly.io/blog/tag/beautifulsoup) [\#python](https://scrapfly.io/blog/tag/python) [\#requests](https://scrapfly.io/blog/tag/requests) [\#scrapeguide](https://scrapfly.io/blog/tag/scrapeguide) [\#tools](https://scrapfly.io/blog/tag/tools) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-imovelweb "Share on LinkedIn")    

 

 

         

Imovelweb is one of Brazil’s biggest real estate marketplaces. If you’re comparing prices, tracking supply, or building a lead pipeline, scraping it can save days of manual work. The catch: Imovelweb uses modern protections (including DataDome and regional controls), so a naïve scraper will often get blocked or see limited content from non‑Brazil IPs.

In this guide, we’ll build a clean, function‑based scraper in Python to extract:

- listings from search pages (title, price, area, bedrooms, link, thumbnail)
- complete details from property pages (address, price, size, amenities, images)
- structured data from JSON‑LD when available
- pagination flow so you can iterate across multiple pages

We'll start with requests + BeautifulSoup to keep things simple. Then we'll show a production approach using Scrapfly with Brazilian geolocation and JavaScript rendering, which sidesteps most blocking headaches. For reference, you can also apply the same ideas you've seen in our other how‑to guides like [Algolia scraping](https://scrapfly.io/blog/posts/how-to-scrape-algolia-search) and [Allegro scraping](https://scrapfly.io/blog/posts/how-to-scrape-allegro).

## Key Takeaways

Master imovelweb api scraping with advanced Python techniques, real estate data extraction, and property monitoring for comprehensive Brazilian market analysis.

- Reverse engineer Imovelweb's API endpoints by intercepting browser network requests and analyzing JSON responses
- Extract structured property data including prices, locations, and property details from Brazilian real estate listings
- Implement pagination handling and search parameter management for comprehensive property data collection
- Configure proxy rotation and fingerprint management to avoid detection and rate limiting
- Use specialized tools like ScrapFly for automated Imovelweb scraping with anti-blocking features
- Implement data validation and error handling for reliable Brazilian property information extraction

### What we'll build

- A listings scraper that extracts summary data from search pages and follows pagination
- A property details scraper that prefers JSON‑LD but falls back to HTML selectors
- Anti‑block tactics (headers, retry, delays) and a practical geolocation setup
- A reliable, production path with Scrapfly (BR IP, JS rendering, session/cookie handling)

If you want the full code as a single file, check `content/posts/how-to-scrape-imovelweb/code.py` in this article’s folder.

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.





## Prerequisites

Install the basics:



bash```bash
pip install requests beautifulsoup4 lxml
```



We’ll use `requests` for HTTP and `BeautifulSoup` for parsing. The `lxml` parser makes parsing faster and more tolerant. If you plan to use Scrapfly for production scraping:



bash```bash
pip install scrapfly-sdk
```



If you only need an HTTP API, you can also call Scrapfly’s API endpoint with `requests`. We’ll show both.

## Anatomy of an Imovelweb page

Imovelweb pages often ship structured data via JSON‑LD (`<script type="application/ld+json">`). When present, this is the cleanest way to pull price, address, number of rooms, and images. If JSON‑LD is missing or partial, we’ll extract from the HTML.

Common elements to look for:

- title: `h1` or a header container near the top
- price: a price container in BRL (R$); sometimes in JSON‑LD `offers.price`
- area: square meters (m²) and land size
- bedrooms/bathrooms/parking: small icon + label sets near the price
- address: neighborhood, city, state, sometimes full address
- gallery: `img` tags or a slider; JSON‑LD often includes image URLs

## A simple, requests-based scraper

We’ll keep code modular so you can reuse parts. First we import dependencies, then create a session with realistic headers, and add a tiny fetch helper with retries.



python```python
import time
import json
import random
import re
from typing import Any, Dict, List, Optional

import requests
from bs4 import BeautifulSoup
```



These imports are all you need for a plain HTML workflow with `requests` and `BeautifulSoup`.



python```python
def create_session() -> requests.Session:
    """Create a requests session with realistic headers and BR Portuguese preferences."""
    session = requests.Session()
    user_agents = [
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    ]
    session.headers.update({
        "User-Agent": random.choice(user_agents),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "pt-BR,pt;q=0.9,en;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
        "DNT": "1",
    })
    return session
```



This session sets a realistic `User-Agent`, localizes language to `pt-BR`, and mirrors a normal browser’s headers to avoid sticking out.



python```python
def get_html(session: requests.Session, url: str, max_retries: int = 3, delay_range: tuple = (1.0, 2.5)) -> Optional[str]:
    """Fetch URL with basic retry and small random delay. Returns HTML text or None."""
    for attempt in range(1, max_retries + 1):
        try:
            time.sleep(random.uniform(*delay_range))
            resp = session.get(url, timeout=30)
            if resp.status_code == 200 and "text/html" in resp.headers.get("Content-Type", ""):
                return resp.text
            if resp.status_code in (403, 429) or "datadome" in resp.text.lower():
                return None
        except requests.RequestException:
            pass
        if attempt < max_retries:
            time.sleep(random.uniform(2.0, 4.0))
    return None
```



This helper adds a tiny random pause, retries on transient errors, and bails out on 403/429 or DataDome pages.

### Extracting listing cards

Selectors change over time, but here we’ll stick to the exact, current hooks (no fallbacks). Cards use `div.postingsList-module__card-container > div.postingCardLayout-module__posting-card-layout` with:

- **price**: `div.postingPrices-module__price[data-qa="POSTING_CARD_PRICE"]`
- **features**: `h3[data-qa="POSTING_CARD_FEATURES"] span` (e.g., `1250 m² tot.`, `4 quartos`, `4 ban.`, `5 vagas`)
- **title/description link**: `h3[data-qa="POSTING_CARD_DESCRIPTION"] a`
- **URL**: also present in `data-to-posting` on `.postingCardLayout-module__posting-card-layout`
- **thumbnail**: the first `img` inside `.postingGallery-module__gallery-container`



python```python
def extract_listings(html: str) -> List[Dict[str, Any]]:
    """Parse a listings page and return a list of listing summaries."""
    soup = BeautifulSoup(html, "lxml")
    listings: List[Dict[str, Any]] = []

    # Use the current card container selector only
    card_candidates = soup.select(".postingCardLayout-module__posting-card-layout")

    for card in card_candidates:
        try:
            # Title/description (anchor inside description block)
            title_el = card.select_one('[data-qa="POSTING_CARD_DESCRIPTION"] a')

            # Price
            price_block = card.select_one('[data-qa="POSTING_CARD_PRICE"]')
            price_el = price_block.get_text(strip=True) if price_block else None

            # Main features: area (m²), bedrooms (quartos), bathrooms (ban./banheiros)
            feature_spans = card.select('[data-qa="POSTING_CARD_FEATURES"] span')
            area_el = None
            beds_el = None
            baths_el = None
            for sp in feature_spans:
                txt = sp.get_text(strip=True)
                if not area_el and "m²" in txt:
                    area_el = txt
                elif not beds_el and re.search(r"\bquartos?\b", txt, flags=re.I):
                    beds_el = txt
                elif not baths_el and re.search(r"\bban(\.|heiros?)\b", txt, flags=re.I):
                    baths_el = txt

            # URL from data-to-posting attribute
            link = card.get("data-to-posting")
            if link and link.startswith("/"):
                link = f"https://www.imovelweb.com.br{link}"

            # Thumbnail from gallery (first image src)
            thumb_el = card.select_one(".postingGallery-module__gallery-container img")
            thumb = thumb_el.get("src") if thumb_el else None

            listings.append({
                "title": (title_el.get_text(strip=True) if title_el else None),
                "price": (price_el.strip() if isinstance(price_el, str) else price_el),
                "area": (area_el.strip() if isinstance(area_el, str) else area_el),
                "bedrooms": (beds_el.strip() if isinstance(beds_el, str) else beds_el),
                "bathrooms": (baths_el.strip() if isinstance(baths_el, str) else baths_el),
                "url": link,
                "thumbnail": thumb,
            })
        except Exception:
            continue

    return [l for l in listings if l.get("url")]
```



This extracts the basics you see on a card: title, price, area, room counts, the link, and a thumbnail. We now use only `data-qa` hooks (`POSTING_CARD_PRICE`, `POSTING_CARD_FEATURES`, `POSTING_CARD_DESCRIPTION`) and `data-to-posting` for the URL, with no fallbacks.

### Extracting property details (JSON‑LD first)

On detail pages, JSON‑LD (if present) is the cleanest source. We’ll parse JSON blocks and look for objects that smell like real estate listings (they often include `offers`, `address`, and `image`).



python```python
def parse_first_jsonld(soup: BeautifulSoup) -> Optional[Dict[str, Any]]:
    for script in soup.find_all("script", attrs={"type": "application/ld+json"}):
        try:
            data = json.loads(script.string or "{}")
        except json.JSONDecodeError:
            continue
        # Sometimes JSON-LD is a list
        if isinstance(data, list):
            for item in data:
                if isinstance(item, dict) and (item.get("@type") or item.get("offers")):
                    return item
        elif isinstance(data, dict) and (data.get("@type") or data.get("offers")):
            return data
    return None
```



This scans `<script type="application/ld+json">` blocks and returns the first relevant object that looks like a listing.



python```python
def extract_property_details(html: str) -> Dict[str, Any]:
    """Extract rich property data from a detail page using JSON‑LD with HTML fallbacks."""
    soup = BeautifulSoup(html, "lxml")
    out: Dict[str, Any] = {}

    jsonld = parse_first_jsonld(soup)
    if jsonld:
        out["jsonld"] = jsonld
        # Common fields
        out["title"] = jsonld.get("name") or jsonld.get("headline")
        if isinstance(jsonld.get("offers"), dict):
            price = jsonld["offers"].get("price")
            currency = jsonld["offers"].get("priceCurrency", "BRL")
            out["price"] = f"{price} {currency}" if price else None
        addr = jsonld.get("address")
        if isinstance(addr, dict):
            out["address"] = ", ".join(filter(None, [
                addr.get("streetAddress"),
                addr.get("addressLocality"),
                addr.get("addressRegion"),
                addr.get("postalCode"),
            ])) or None
        images = jsonld.get("image")
        if isinstance(images, list):
            out["images"] = images
        elif isinstance(images, str):
            out["images"] = [images]

    # HTML fallbacks
    title_el = soup.select_one("h1")
    if title_el and not out.get("title"):
        out["title"] = title_el.get_text(strip=True)

    price_text = soup.find(string=re.compile(r"R\$\s?\d"))
    if price_text and not out.get("price"):
        out["price"] = price_text.strip()

    # Room counts (Portuguese labels vary: quartos, suítes, banheiros, vagas)
    def find_label(pattern: str) -> Optional[str]:
        el = soup.find(string=re.compile(pattern, re.I))
        return el.strip() if isinstance(el, str) else None

    out.setdefault("bedrooms", find_label(r"\bquartos?\b|\bdormitórios?\b"))
    out.setdefault("bathrooms", find_label(r"\bbanheiros?\b"))
    out.setdefault("parking", find_label(r"\bvagas?\b"))
    out.setdefault("area", find_label(r"\d+[\.,]?\d*\s*m²"))

    # Description
    desc_el = soup.select_one("[data-testid='description'], .description, #description")
    if desc_el:
        out["description"] = desc_el.get_text(" ", strip=True)

    return out
```



This prefers JSON‑LD for clean fields (title, price, address, images) and fills any gaps with simple HTML lookups.

### Pagination helper

We’ll look for a “next” link and return an absolute URL.



python```python
from urllib.parse import urljoin

def find_next_page(html: str, current_url: str) -> Optional[str]:
    soup = BeautifulSoup(html, "lxml")
    next_link = soup.find("a", attrs={"rel": "next"}) or soup.find("a", string=re.compile(r"Próxima|Seguinte|Próximo", re.I))
    if next_link and next_link.get("href"):
        return urljoin(current_url, next_link["href"])
    return None
```



Scrapfly

#### Extract structured data automatically?

Scrapfly's Extraction API uses AI to turn any webpage into structured data — no selectors needed.

[Try Free →](https://scrapfly.io/register)This looks for a `rel="next"` link or a localized “Próximo” label and returns an absolute URL.

### Putting it together: scrape N listing pages

The function below walks listing pages, collects summary data, and yields property URLs to process downstream.



python```python
def scrape_list_pages(start_url: str, max_pages: int = 3) -> List[Dict[str, Any]]:
    session = create_session()
    url = start_url
    all_listings: List[Dict[str, Any]] = []

    for _ in range(max_pages):
        html = get_html(session, url)
        if not html:
            break
        page_listings = extract_listings(html)
        all_listings.extend(page_listings)
        nxt = find_next_page(html, url)
        if not nxt:
            break
        url = nxt

    # Deduplicate by URL
    seen: set = set()
    unique: List[Dict[str, Any]] = []
    for item in all_listings:
        u = item.get("url")
        if u and u not in seen:
            seen.add(u)
            unique.append(item)
    return unique
```



Example usage:



python```python
if __name__ == "__main__":
    # Example: São Paulo houses for sale (adjust filters on site and paste the URL)
    start = "https://www.imovelweb.com.br/casas-venda-sao-paulo-sp.html"
    listings = scrape_list_pages(start, max_pages=2)
    print(f"Collected {len(listings)} listings")
    for it in listings[:5]:
        print(it["title"], it["price"], it["url"])  # sample
```



Here’s how the output might look:

 Example outputjson```json
{
  "title": "Rua Luigi Alamanni",
  "price": "R$ 600.000",
  "area": "90 m²",
  "bedrooms": "1 quartos",
  "bathrooms": "1 banheiros",
  "url": "https://www.imovelweb.com.br/propriedades/casa-a-venda-sacoma-1-quarto-90-m-sao-paulo-3000738541.html",
  "thumbnail": "https://imgbr.imovelwebcdn.com/avisos/2/30/00/73/85/41/360x266/4556613970.jpg?isFirstImage=true"
}
  
```



## Scraping property details

Given a list of property URLs from the step above, fetch each page and extract details. Prefer JSON‑LD, then fall back to HTML.



python```python
def scrape_property(url: str) -> Optional[Dict[str, Any]]:
    session = create_session()
    html = get_html(session, url)
    if not html:
        return None
    return extract_property_details(html)
```



This fetches a single property page with your session and returns a dict built by `extract_property_details`.

## Handling anti‑bot, DataDome, and geolocation (BR IPs)

Imovelweb inspects browser fingerprints and often enforces Brazilian geolocation. If you hit 403/429 responses or a DataDome page, switch to a rendering proxy with country targeting. The most straightforward path is Scrapfly.

### Scrapfly

When you need reliable BR IPs and JavaScript rendering, use the Scrapfly SDK to fetch pages consistently.



python```python
# pip install scrapfly-sdk

import os
from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key=os.getenv("SCRAPFLY_API_KEY"))
cfg = ScrapeConfig(url="https://www.imovelweb.com.br/casas-venda-sao-paulo-sp.html", render_js=True, country="br")
result = client.scrape(cfg)

print(result.content)
```



With Scrapfly you get session handling, high‑quality Brazilian IPs, and automatic mitigation for common bot challenges. You can then pass the returned HTML into the same `extract_listings` / `extract_property_details` functions you already wrote.

Check out [Scrapfly's web scraping API](https://scrapfly.io/web-scraping-api) for all the details.

## Tips to avoid blocks

- Keep headers realistic and localized (`Accept-Language: pt-BR`).
- Add short, random delays between requests; back off on error spikes.
- Use retries for network hiccups; don’t retry instantly on 403/429.
- Prefer JSON‑LD over brittle CSS selectors.
- For multi‑page crawls, use a render proxy with BR geolocation.
- Cache pages during development so you debug parser logic offline.

For a deep dive into anti‑block tactics (TLS fingerprints, headless browsers, etc.), see

[How to Bypass Anti-Bot Protection When Web ScrapingLearn how anti-bot systems detect scrapers and 5 universal bypass techniques including proxy rotation, fingerprinting, and fortified headless browsers.](https://scrapfly.io/blog/posts/how-to-bypass-anti-bot-protection-when-web-scraping)



## FAQ

Why do I see a DataDome page or get 403/429?Your requests likely lack a real browser fingerprint or the IP isn’t in Brazil. Switch to a render proxy with BR geolocation (e.g., Scrapfly with `country="br"` and `render_js=True`).







Do I need JavaScript rendering for Imovelweb?Often yes, especially for listings and dynamic widgets. JSON‑LD can still load server‑side, but JS rendering increases success rates and consistency.







Is HTML parsing enough if JSON‑LD is missing?Yes. Target stable containers around price, area (m²), and room counts. Keep multiple selectors and review them periodically as the site evolves.









## Summary

We put together a small, readable toolkit for Imovelweb: a listings scraper, a detail extractor that prefers JSON‑LD, simple pagination, and a few guardrails (headers, delays, retries). The code is intentionally plain so you can change selectors fast when the UI shifts, and you can drop each function into your own pipeline without dragging along extra structure.

When you need reliability at scale, fetch pages through Scrapfly with Brazilian geolocation and, when necessary, JavaScript rendering. Add modest rate limiting, retries with backoff, and a cache for debugging. From here, wire results into your storage (CSV/DB), schedule runs, and keep an eye on error rates so you can adjust selectors and pacing before anything breaks.

Legal Disclaimer and PrecautionsThis tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect:

- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens protected by GDPR.
- Do not repurpose *entire* public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow. For more you should consult a lawyer.



 

    Table of Contents- [Key Takeaways](#key-takeaways)
- [What we'll build](#what-we-ll-build)
- [Prerequisites](#prerequisites)
- [Anatomy of an Imovelweb page](#anatomy-of-an-imovelweb-page)
- [A simple, requests-based scraper](#a-simple-requests-based-scraper)
- [Extracting listing cards](#extracting-listing-cards)
- [Extracting property details (JSON‑LD first)](#extracting-property-details-json-ld-first)
- [Pagination helper](#pagination-helper)
- [Putting it together: scrape N listing pages](#putting-it-together-scrape-n-listing-pages)
- [Scraping property details](#scraping-property-details)
- [Handling anti‑bot, DataDome, and geolocation (BR IPs)](#handling-anti-bot-datadome-and-geolocation-br-ips)
- [Scrapfly](#scrapfly)
- [Tips to avoid blocks](#tips-to-avoid-blocks)
- [FAQ](#faq)
- [Summary](#summary)
 
    Join the Newsletter  Get monthly web scraping insights 

 

  



Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 

## Explore this Article with AI

 [ ChatGPT ](https://chat.openai.com/?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-imovelweb) [ Gemini ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-imovelweb) [ Grok ](https://x.com/i/grok?text=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-imovelweb) [ Perplexity ](https://www.perplexity.ai/search/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-imovelweb) [ Claude ](https://claude.ai/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-imovelweb) 



 ## Related Articles

 [  

 python scrapeguide 

### How to Scrape Real Estate Property Data using Python

Introduction to scraping real estate property data. What is it, why and how to scrape it? We'll also list dozens of popu...

 

 ](https://scrapfly.io/blog/posts/how-to-scrape-real-estate-property-data-using-python) [  

 python data-parsing 

### Ultimate Guide to JSON Parsing in Python

Learn JSON parsing in Python with this ultimate guide. Explore basic and advanced techniques using json, and tools like ...

 

 ](https://scrapfly.io/blog/posts/how-to-use-python-to-parse-json) [  

 python scrapeguide 

### How to Scrape Homegate.ch Real Estate Property Data

Scrape Homegate.ch property listings and search results with Python. Updated 2026 tutorial covering JavaScript rendering...

 

 ](https://scrapfly.io/blog/posts/how-to-scrape-homegate-ch-real-estate-property-data) 

  



   



 Extract structured data with AI, **1,000 free credits** [Start Free](https://scrapfly.io/register)