     [Blog](https://scrapfly.io/blog)   /  [python](https://scrapfly.io/blog/tag/python)   /  [How to Scrape Instagram in 2026](https://scrapfly.io/blog/posts/how-to-scrape-instagram)   # How to Scrape Instagram in 2026

 by [Bernardas Alisauskas](https://scrapfly.io/blog/author/bernardas) Jun 25, 2026 25 min read [\#python](https://scrapfly.io/blog/tag/python) [\#scrapeguide](https://scrapfly.io/blog/tag/scrapeguide) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-instagram "Share on LinkedIn")    

 

 

   

Instagram holds valuable data for businesses. You can extract competitor insights, customer sentiment, and market trends from profiles, posts, and comments. However, Instagram makes it hard to scrape their data. In 2026, Instagram's anti-bot defenses have grown more aggressive, mandatory login walls, GraphQL obfuscation, and rapid IP flagging mean that simple scripts with BeautifulSoup or basic HTTP clients no longer work.

In this guide, you'll learn how Instagram blocks scrapers, what data you can extract, and why building your own solution usually fails. We'll also show you a better approach using ScrapFly's maintained Instagram scraper that handles all the blocking challenges automatically.

## Key Takeaways

- Instagram scraping in 2026 means calling the same hidden REST and GraphQL endpoints Instagram's web app uses, not parsing HTML. The `web_profile_info` endpoint returns full profile JSON; posts and comments go through `graphql/query` with a `doc_id` parameter.
- Three defenses kill DIY scrapers: datacenter IPs are blocked on the first request, Python's `requests` and `httpx` are fingerprinted at the TLS layer, and a single residential IP is capped at roughly 200 requests per hour.
- A working scraper breaks every 2-4 weeks because Instagram rotates its GraphQL `doc_id` values as a deliberate anti-scraping measure; ongoing maintenance is the real cost of building your own.
- Only public data is reachable: profiles, posts, reels, and comments. No scraper can access private accounts, full follower lists, or current stories, and Instagram's native search now sits behind the login wall.
- ScrapFly's open-source [Instagram scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper) ships current endpoints and doc\_ids and runs on the [Web Scraping API](https://scrapfly.io/web-scraping-api) with anti-bot bypass and residential proxies included, so an Instagram change becomes a `git pull` instead of a re-reverse-engineering project.

[**View Source Code**github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper)

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.







## What Instagram Data Can You Scrape?

Instagram's public data provides business intelligence when extracted systematically. Here's what you can scrape and why it matters:

**Profiles** - Extract bio, follower/following counts, verification status, business contact info, and post statistics. Use case: Build lead lists by scraping verified business profiles in specific niches, then reach out using their public email addresses.

**Posts** - Capture captions, images, videos, likes, view counts, timestamps, location tags, and tagged users. Use case: Analyze your competitor's top-performing content to understand what resonates with your shared audience and replicate successful formats.

**Reels** - Access video URLs, play counts, music attribution, duration, and engagement metrics. Use case: Track trending audio clips and formats in your industry to inform your own content strategy before trends peak.

**Comments** - Scrape comment text, nested replies, timestamps, author profiles, and like counts. Use case: Perform sentiment analysis on competitor posts to identify customer pain points and service gaps you can address.

**Hashtags** - Native hashtag browsing is login-gated as of 2026. Discovery workaround: use Google's `site:instagram.com/p/ keyword` search to find public posts by topic, then scrape those URLs directly. For structured hashtag data, the Instagram Graph API requires Facebook-login authentication. Use case: Find micro-influencers by searching niche keywords via Google and ranking results by engagement rate.

Before you start scraping, it's worth knowing what Instagram restricts at a structural level. Some data is simply off limits, regardless of which scraper or API you use.

## What You Can't Scrape from Instagram (2026 Limits)

These restrictions apply regardless of which tool you use. They are structural limits, not anti-bot defenses:

- **Private accounts**: no scraper, API, or tool can access them.
- **Full follower and following lists**: behind login the official API returns only `followers_count` not the list itself.
- **Current stories**: login-only story highlights on public profiles remain accessible.
- **Native hashtag and keyword browsing**: removed from logged-out access in 2024. See "How to Find Instagram Posts Without Native Search" below for the workaround.
- **Contact info beyond the public bio**: only visible if a business profile has made it public.

Understanding these hard limits prevents wasted effort. With the structural ceiling defined, let's look at how Instagram actively blocks access to the data that is public.



## How Instagram Blocks Scrapers (Anti-Bot Detection Explained)

Instagram uses a [multi-layered blocking system](https://scrapfly.io/blog/posts/how-to-know-what-anti-bot-website-uses) designed to identify and block automated scraping. Understanding these systems shows why manual scraping solutions fail and require constant maintenance.

### Rate Limiting &amp; IP Blocking

Instagram enforces strict request quotas to prevent aggressive scraping:

- Request limits: ~200 requests per hour per IP address for non-authenticated users
- Throttling response: After exceeding limits, you receive [HTTP 429 ](https://scrapfly.io/blog/posts/what-is-http-error-429-too-many-requests)
- Block duration: Your IP gets temporarily rate-limited for hours or days depending on violation severity
- Progressive penalties: Repeated violations lead to longer blocks and eventually permanent IP bans

Even if you implement delays and respect rate limits, you're still limited to scraping ~4,800 profiles per day per IP. This is insufficient for serious data collection.

### IP Quality Detection

Instagram analyzes your IP address quality before even processing your request:

- Datacenter IPs blocked instantly: Requests from AWS, DigitalOcean, Google Cloud, and other hosting providers are flagged immediately
- Residential IPs required: Instagram expects requests from genuine consumer ISPs (Comcast, AT&amp;T, etc.)
- ASN reputation checking: Instagram maintains blocklists of ASNs (Autonomous System Numbers) associated with proxies and VPNs
- This runs BEFORE rate limits: A datacenter IP gets blocked on the first request, regardless of how slowly you scrape

This is why you cannot deploy your scraper to a cloud server and expect it to work. Instagram blocks it before you even hit the rate limit.

### Browser Fingerprinting

Instagram analyzes dozens of browser characteristics to detect automation tools:

- TLS/SSL fingerprinting: Python's `requests` library has a unique [TLS handshake signature](https://scrapfly.io/blog/posts/how-to-avoid-web-scraping-blocking-tls) that Instagram flags as a bot instantly
- HTTP/2 fingerprinting: The order and format of HTTP/2 frames reveals whether you're using a real browser or a scripting library
- Header order consistency: Real browsers send headers in a specific order; scrapers often randomize or alphabetize them
- [Canvas/WebGL fingerprinting](https://scrapfly.io/blog/posts/browser-fingerprinting-with-creepjs): When JavaScript is enabled, Instagram tests how your browser renders graphics. Automation frameworks produce consistent, detectable signatures

Even if you copy all the correct headers from a real browser, the TLS handshake alone will expose you as a bot within seconds.

### Request Pattern Detection

Instagram's behavioral analysis identifies non-human usage patterns:

- Timing patterns: Perfect 3-second delays between requests look robotic; humans vary their timing
- Request sequencing: Real users navigate naturally (view profile → scroll → click post); bots often access API endpoints directly without realistic browsing
- Session validation: Instagram expects correlated requests (CSS, images, analytics) alongside your API calls; scraping just the data endpoints is suspicious
- Cookie behavior: Missing, malformed, or inconsistent cookies signal automation

Instagram's machine learning models are trained on millions of real user sessions. Any deviation from natural human behavior raises red flags.



Instagram's Multi-Layered Anti-Bot Defense SystemEven a well-built scraper gets blocked within hours without serious anti-detection infrastructure. Instagram updates its blocking systems frequently, so scrapers that work today can break by next week. The detection stack it uses is similar to what [Cloudflare](https://scrapfly.io/blog/posts/how-to-bypass-cloudflare-anti-scraping) and other anti-bot systems apply against automated traffic.



## Instagram Scraper API vs Building Your Own: Which Should You Use?

Once you understand why Instagram blocks scrapers, the real question is who maintains the fix when it breaks. Here is how the four main approaches compare:

| Approach | What it gets you | Rate limit | Maintenance | Cost | Pick when |
|---|---|---|---|---|---|
| DIY Python (`curl_cffi` + residential proxies) | Full control, any public endpoint | ~200 req/hr/IP | doc\_id + fingerprint upkeep every 2-4 weeks | Proxies at $1-3/GB | Learning or tiny volume |
| Instaloader (open source) | Profiles and posts, login optional | Same IP limits | Community-maintained | Free | Research, under 1K req/day |
| Official Graph API | Your own Business or Creator accounts only | 200 calls/hr/token | Stable, requires Meta app review | Free | You own the account |
| ScrapFly scraper + Web Scraping API | All public data, anti-bot and proxies handled | Managed | Repository updated for you | Per-request | Production or third-party data |

If you own the account, use the Graph API. For any account you don't own, the public web endpoints are the only path. The real choice is who maintains the scraper when Instagram changes it: you, or someone whose job it is.



## How to Scrape Instagram with ScrapFly (The Easy Way)

ScrapFly's [web scraping API](https://scrapfly.io/web-scraping-api) provides the complete Instagram scraping solution: working scraper code + anti-blocking infrastructure. Clone the repository, configure your API key, and start scraping in 5 minutes.

### What You Get

- Working scraper code: Open source, actively maintained, updated within hours when Instagram changes
- Built-in anti-blocking infrastructure: TLS fingerprinting, header rotation, and behavioral mimicry handled automatically
- Residential proxy network included: 50M+ IPs from real consumer ISPs with no separate proxy bills or configuration
- Automatic updates: When Instagram changes doc\_ids or endpoints, we update the scraper immediately
- Cost optimization: Proxy Saver feature reduces residential proxy costs by 30-50% through intelligent caching

[**Get Started in 5 Minutes →**](https://scrapfly.io/pricing)

bash```bash
# Clone the scraper repository
git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd scrapfly-scrapers/instagram-scraper

# Configure your ScrapFly API key
export SCRAPFLY_KEY="your_key_here"

# Install dependencies
poetry install

# Start scraping
poetry run python run.py
```



### How ScrapFly Bypasses Every Defense

Anti-Blocking Bypass: ScrapFly rotates TLS fingerprints to match real Chrome/Firefox browsers, orders HTTP headers correctly, and mimics genuine browser behavior. Instagram sees legitimate browser traffic, not a scraper.

Proxy Management: Our network of [50M+ residential IPs](https://scrapfly.io/blog/posts/best-proxy-providers-for-web-scraping) automatically rotates with each request. Instagram sees requests from real consumer devices across different ISPs and locations, exactly like genuine users.

Rate Limit Handling: Smart throttling and exponential backoff automatically slow down when Instagram pushes back. The scraper adjusts its speed dynamically to stay under the radar.

Proxy Saver: Reduces residential proxy costs by 30-50% by intelligently caching static content and only using premium residential IPs for the actual API calls. For a 10,000 profile scraping job, this saves $15-30 in proxy costs.



## How Instagram's Scraping API Works

Instagram does not provide official public APIs, but their web and mobile apps communicate with backend APIs we can access directly. Instagram uses two API architectures:

- REST API: Simple endpoints for basic data (e.g., `/api/v1/users/web_profile_info/` for profiles)
- [GraphQL](https://scrapfly.io/blog/posts/web-scraping-graphql-with-python) API: Complex query system for posts, comments, and paginated data

Instagram uses REST APIs for straightforward requests where the data structure is simple, and GraphQL for complex queries involving nested data, filtering, or pagination.

### Finding Instagram's Hidden Endpoints

When Instagram updates their platform, endpoints change. Here's how to discover current endpoints when they break:

Step 1: Open Instagram in Chrome/Firefox and open DevTools (F12)

Step 2: Go to Network tab and filter by "Fetch/XHR" to see API calls

Step 3: Navigate Instagram normally (visit a profile, view a post, scroll comments)

Step 4: Watch for API requests to domains like:

- `i.instagram.com/api/v1/` (REST endpoints)
- `www.instagram.com/graphql/query` (GraphQL endpoints)

Step 5: Click on an API request to inspect:

- Request headers (especially `x-ig-app-id`)
- Request payload (for GraphQL, look for `variables` and `doc_id`)
- Response structure (to understand data format)

REST Example: When viewing a profile, you will see a request to:

```
https://i.instagram.com/api/v1/users/web_profile_info/?username=google
```



GraphQL Example: When viewing a post, you will see a POST request to:

```
https://www.instagram.com/graphql/query
```



With a payload containing `doc_id` and `variables` parameters.

### Understanding doc\_id

The `doc_id` parameter is critical for GraphQL scraping but poorly understood. Here's what you need to know:

What is doc\_id?

- Instagram's internal identifier for specific GraphQL query structures
- Maps to a predefined query on Instagram's backend (you cannot define custom queries)
- Example: `doc_id=8845758582119845` retrieves post details

Why doc\_ids exist:

- Performance: Pre-defined queries are optimized and cached on Instagram's servers
- Security: Prevents custom queries that could overload the database
- Anti-scraping: Changing doc\_ids regularly breaks scrapers

Why doc\_ids change:

- Instagram updates their GraphQL schema every 2-4 weeks
- Changes are a deliberate anti-scraping measure
- No public documentation of current values. You must discover them yourself

How to find current doc\_ids:

1. Open DevTools → Network tab → filter for "graphql"
2. Trigger the action on Instagram (view post, load comments, etc.)
3. Inspect the request payload for `doc_id=` parameter
4. Note the numeric value (e.g., `8845758582119845`)

Different operations require different doc\_ids:

- Profile posts: `9310670392322965`
- Post details: `8845758582119845`
- Comments pagination: (changes frequently)
- User search: (changes frequently)

The DIY Pain: You must monitor doc\_ids manually and update your scraper every time Instagram changes them (every 2-4 weeks). Miss an update and your scraper breaks silently.

ScrapFly Solution: Our open-source repository is updated within hours of Instagram changes. You pull the latest code and keep scraping with no detective work required.

### Required Headers

[Headers are not just formalities](https://scrapfly.io/blog/posts/python-requests-headers-guide). Instagram validates them strictly. Here is what you need and why:

Critical Headers:

python```python
{
    "x-ig-app-id": "936619743392459",  # Instagram web app identifier
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "*/*",
}
```



Why each header matters:

- x-ig-app-id: Identifies your request as coming from Instagram's web app (not mobile app or unauthorized client). Wrong value equals instant 403 error.
- User-Agent: Must match a real browser signature. Python's default User-Agent screams "bot" and gets blocked immediately.
- Accept-Language: Instagram tracks inconsistent language preferences across requests. Keep it stable per session.
- Accept-Encoding: Real browsers always accept compression. Omitting this is suspicious.
- Accept: Wildcard is fine, but must be present.

What happens with wrong headers:

- 403 Forbidden: TLS fingerprint or app-id mismatch detected
- 400 Bad Request: Malformed headers or missing required fields
- No response: Your IP was flagged and silently dropped

Header consistency requirement: Instagram correlates requests within a session. If your User-Agent changes mid-session or headers conflict with your TLS fingerprint, you are flagged instantly.



## How to Find Instagram Posts Without Native Search

Instagram removed hashtag and keyword browsing from logged-out access in 2024. Without an account, you cannot use Instagram's search to discover posts or profiles by topic. This creates a discovery gap that no scraper solves on its own.

The workaround is Google. Instagram posts are indexed publicly, so standard `site:` search syntax lets you find post URLs by keyword without touching Instagram directly:

```
site:instagram.com/p/ coffee roasters
site:instagram.com/reel/ electric vehicles
```



This returns a list of public post URLs matching your keyword. Feed those URLs into the scraper below to pull captions, engagement metrics, and author profile data. For scraping the Google results themselves, ScrapFly's [Google search scraper](https://scrapfly.io/blog/posts/how-to-scrape-google) handles pagination and anti-bot detection automatically.

Coverage is not complete. Google's index is not real-time and won't surface every post. Use this method to build a seed list of relevant accounts, then scrape their post history directly once you have their profile URLs.



## How to Scrape Instagram Profiles

Instagram profiles contain valuable business intelligence: follower counts, bio information, business contact details, and recent posts. We will use Instagram's REST API endpoint that returns profile data as JSON.

What you can extract:

- **Full name, username, user ID, verification status**
- **Bio text and external links**
- **Follower and following counts**
- **Profile picture URL**
- **Business category, phone number, email** (for business accounts)
- **First 12 posts with preview data**

**The approach:**We make a GET request to Instagram's profile API endpoint with the username as a parameter. The response includes complete profile data in JSON format.

ScrapFly's scraper handles:

- **Proper header formatting and x-ig-app-id rotation**
- **Residential proxy rotation to avoid IP blocks**
- **TLS fingerprint matching to bypass blocking detection**
- **Automatic retry with exponential backoff on rate limits**

Code snippet from the ScrapFly scraper (using the [httpx library](https://scrapfly.io/blog/posts/web-scraping-with-python-httpx)):

python```python
from scrapfly import ScrapeConfig, ScrapflyClient
import json

scrapfly = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")

INSTAGRAM_APP_ID = "936619743392459"
BASE_CONFIG = {
    "asp": True,  # Anti Scraping Protection bypass
    "country": "US",  # Use US residential proxies
}

async def scrape_profile(username: str):
    """Scrape Instagram profile data"""
    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
            headers={"x-ig-app-id": INSTAGRAM_APP_ID},
            **BASE_CONFIG,
        )
    )
    data = json.loads(result.content)
    return data["data"]["user"]

# Example usage
profile = await scrape_profile("google")
print(f"Followers: {profile['edge_followed_by']['count']}")
```



Key implementation details:

- The **`asp=True`** parameter activates ScrapFly's anti-blocking bypass (TLS fingerprinting, header rotation)
- **Residential proxies** (`country="US"`) prevent datacenter IP blocks
- The endpoint returns **up to 12 recent posts** embedded in the profile response
- **Business accounts** expose email/phone in `business_email` and `business_phone_number` fields

[**Full profile scraper code in our repository →**](https://github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper)



Scrapfly

#### Scale your web scraping effortlessly

Scrapfly handles proxies, browsers, and anti-bot bypass — so you can focus on data.

[Try Free →](https://scrapfly.io/register)## How to Scrape Instagram Posts

Post data includes captions, media URLs, engagement metrics, comments, and tagged users. Instagram uses GraphQL for post queries, requiring proper doc\_id values and request formatting.

What you can extract:

- **Post shortcode, ID, and timestamp**
- **Image/video URLs** (full resolution)
- **Captions and hashtags**
- **Like counts, view counts** (for videos), **play counts** (for reels)
- **First page of comments** (with pagination cursor for more)
- **Tagged users and location data**
- **Related posts**

**The approach:**We send a POST request to Instagram's GraphQL endpoint with a payload containing the post shortcode and the correct doc\_id. Instagram returns complete post data including engagement metrics and comments.

ScrapFly's scraper handles:

- **Current doc\_id values** (updated within hours when Instagram changes them)
- **GraphQL payload formatting and URL encoding**
- **Comment pagination** for posts with 100+ comments
- **Different post types**: photos, videos, reels, carousels

Code snippet from the ScrapFly scraper:

python```python
from scrapfly import ScrapeConfig, ScrapflyClient
import json
from urllib.parse import quote

scrapfly = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")

INSTAGRAM_POST_DOC_ID = "8845758582119845"  # Updated regularly
BASE_CONFIG = {"asp": True, "country": "US"}

async def scrape_post(url_or_shortcode: str):
    """Scrape single Instagram post data"""
    # Extract shortcode from URL or use directly
    if "http" in url_or_shortcode:
        shortcode = url_or_shortcode.split("/p/")[-1].split("/")[0]
    else:
        shortcode = url_or_shortcode

    # Build GraphQL request payload
    variables = quote(json.dumps({
        'shortcode': shortcode,
        'fetch_tagged_user_count': None,
        'hoisted_comment_id': None,
        'hoisted_reply_id': None
    }, separators=(',', ':')))

    body = f"variables={variables}&doc_id={INSTAGRAM_POST_DOC_ID}"

    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url="https://www.instagram.com/graphql/query",
            method="POST",
            body=body,
            headers={"content-type": "application/x-www-form-urlencoded"},
            **BASE_CONFIG,
        )
    )

    data = json.loads(result.content)
    return data["data"]["xdt_shortcode_media"]

# Example usage
post = await scrape_post("https://www.instagram.com/p/CuE2WNQs6vH/")
print(f"Likes: {post['edge_media_preview_like']['count']}")
```



Key implementation details:

- The **shortcode** is the unique post identifier (e.g., `CuE2WNQs6vH` from URL `/p/CuE2WNQs6vH/`)
- **GraphQL requires URL-encoded JSON** in the request body
- The response includes **nested structures for comments** (`edge_media_to_parent_comment`)
- **Carousel posts** have multiple images in `edge_sidecar_to_children`

[**Full post scraper code with pagination →**](https://github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper)



## How to Scrape Instagram Comments

Comments provide sentiment data, user engagement patterns, and conversation threads. Instagram paginates comments, requiring multiple requests to extract full comment sections.

What you can extract:

- **Comment text and timestamp**
- **Commenter username, profile, verification status**
- **Like counts per comment**
- **Nested replies** (threaded conversations)
- **Pagination cursors** for loading more comments

**The approach:**Comments are included in the initial post data (first ~12 comments), but posts with hundreds of comments require pagination. We use the `end_cursor` value from `page_info` to load subsequent pages through additional GraphQL requests.

ScrapFly's scraper handles:

- **Nested pagination** (comments and their replies have separate cursors)
- **Rate limit respect** (Instagram throttles aggressive comment scraping)
- **Proper doc\_id** for comment pagination queries
- **Reply threading** and parent-child comment relationships

Code snippet for comment pagination:

python```python
async def scrape_post_comments(shortcode: str, max_comments: int = 100):
    """Scrape comments from Instagram post with pagination"""
    comments = []
    cursor = None

    while len(comments) < max_comments:
        variables = quote(json.dumps({
            'shortcode': shortcode,
            'first': 50,  # Comments per page
            'after': cursor,  # Pagination cursor
        }, separators=(',', ':')))

        body = f"variables={variables}&doc_id={INSTAGRAM_COMMENTS_DOC_ID}"

        result = await scrapfly.async_scrape(
            ScrapeConfig(
                url="https://www.instagram.com/graphql/query",
                method="POST",
                body=body,
                headers={"content-type": "application/x-www-form-urlencoded"},
                **BASE_CONFIG,
            )
        )

        data = json.loads(result.content)
        comment_data = data["data"]["xdt_shortcode_media"]["edge_media_to_parent_comment"]

        # Extract comments from this page
        for edge in comment_data["edges"]:
            comments.append(edge["node"])

        # Check for more comments
        page_info = comment_data["page_info"]
        if not page_info["has_next_page"]:
            break

        cursor = page_info["end_cursor"]

    return comments[:max_comments]
```



Key implementation details:

- The **`first` parameter** controls comments per page (max ~50)
- Each comment includes **`edge_threaded_comments`** for nested replies
- **Replies have their own pagination system** requiring separate requests
- The scraper **respects Instagram's rate limits** by adding delays between pagination requests

[**Full comment scraper with reply handling →**](https://github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper)



## How to Scrape Instagram with Proxies

Proxies are mandatory for Instagram scraping at any scale. Instagram's IP quality detection blocks datacenter IPs instantly, and rate limits force you to rotate residential IPs to maintain scraping speed.

### Best Proxies for Instagram Scraping (Residential vs Datacenter)

**Datacenter Proxies: Do not use**

- **Blocked instantly** by Instagram's IP quality checks
- **No request volume possible**. Banned on first request
- Cheaper per GB, but **100% failure rate** makes cost irrelevant

**Residential Proxies: Required**

- **IPs from real consumer ISPs** (Comcast, Verizon, AT&amp;T, etc.)
- **Pass Instagram's IP quality detection**
- Each IP allows **~200 requests/hour** before rate limiting
- **Geographic targeting** (e.g., US-only IPs for US-focused scraping)

**Mobile Proxies: Premium Option**

- **IPs from mobile carriers** (4G/5G networks)
- **Highest trust score**. Instagram rarely blocks mobile IPs
- **Better rate limits** (~300 requests/hour per IP)
- **More expensive** ($60-120/month per IP vs $1-3 for residential)

Recommendation: [Residential proxies](https://scrapfly.io/blog/posts/introduction-to-proxies-in-web-scraping) are the sweet spot for Instagram scraping. Mobile proxies offer marginal improvement at 10-20x the cost. Not worth it unless you are scraping millions of profiles daily.

### How to Rotate Proxies for Instagram Scraping

Proxy rotation strategies determine your scraping speed and block rate:

**Sticky Sessions (Recommended):**

- Use the **same IP for 5-10 minutes**, then rotate
- **Mimics real user behavior** (one person does not change IPs every 10 seconds)
- Allows **~15-30 requests per IP** before rotation
- Instagram's behavioral analysis **flags instant IP changes** as suspicious

**Request-Level Rotation (Aggressive):**

- **New IP for every single request**
- **Maximizes speed** but looks unnatural to Instagram
- **Higher block rate**. Use only with anti-bot bypass (like ScrapFly)
- **Necessary when scraping 10,000+ profiles/hour**

**Smart Rotation Based on Response:**

- **Rotate immediately** on 429 (rate limit) or 403 (block)
- **Continue using same IP** while responses are 200 OK
- **Implements exponential backoff**: 2s delay → 4s → 8s → 16s before rotating
- **Reduces wasted proxy bandwidth**

**ScrapFly's automatic proxy management:**

- **Intelligent rotation** using sticky sessions by default
- **Instant rotation** on rate limits or blocks
- **Geographic pinning** (keep requests in same country/region)
- **Proxy pool health monitoring** (removes dead IPs automatically)

### Instagram Proxy Costs

Residential proxies are billed per GB of bandwidth consumed. Here's what Instagram scraping costs:

**Data usage per request type:**

- **Profile scrape**: ~50-100 KB per profile
- **Post scrape**: ~30-80 KB per post
- **Comment scrape**: ~20-50 KB per comment page

**Example scraping job: 10,000 Instagram profiles**

- 10,000 profiles × 75 KB average = **750 MB**
- Standard residential proxy cost: **$10-15 per GB**
- **Total cost: $7.50-11.25** in proxy bandwidth

**ScrapFly's Proxy Saver feature:**

- **Caches static content** (profile images, CSS, JavaScript)
- **Only uses residential bandwidth** for actual API calls
- **Reduces bandwidth consumption by 30-50%**
- Same 10,000 profile job: **$5.25-7.88 with Proxy Saver**
- **Savings: $2.25-3.37 per 10K profiles** (30-40% reduction)

For serious Instagram scraping (**100K+ profiles/month**), Proxy Saver saves **$50-100+ monthly** in proxy costs alone.



## FAQ

Does Instagram have an official API in 2026?The Basic Display API shut down in December 2024. The Instagram Graph API only covers accounts you own, caps at 200 calls per hour, and never returns follower lists. For any account you don't own, scraping the public web endpoints is the only path.







How to get Instagram user ID from username?Scrape the user profile using the **`/api/v1/users/web_profile_info/`** endpoint and extract the **`id`** field:

python```python
import httpx
import json

response = httpx.get(
    f"https://i.instagram.com/api/v1/users/web_profile_info/?username=google",
    headers={"x-ig-app-id": "936619743392459"}
)
data = json.loads(response.content)
user_id = data["data"]["user"]["id"]
print(user_id)  # Output: 1067259270
```









How to get Instagram username from user ID?Use Instagram's **public mobile API endpoint**:

python```python
import httpx

iphone_api = "https://i.instagram.com/api/v1/users/{}/info/"
iphone_user_agent = "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Mobile/14G60 Instagram 12.0.0.16.90"

response = httpx.get(
    iphone_api.format("1067259270"),
    headers={"User-Agent": iphone_user_agent}
)
username = response.json()['user']['username']
print(username)  # Output: google
```









How do I handle Instagram's rate limiting when scraping at scale?Instagram rate limiting requires a **three-part strategy**:

1. **Residential proxy rotation:** Use **50-100+ residential IPs** and rotate them in sticky sessions (**5-10 minutes per IP**). Each IP allows **~200 requests/hour**.
2. **Realistic delays:** Space requests **2-5 seconds apart** with random variance. Perfect timing intervals look robotic.
3. **Exponential backoff:** When you receive a **429 error**, back off exponentially (wait 2s, then 4s, then 8s, etc.) before retrying.

ScrapFly handles all three automatically, you specify your desired scraping speed and we manage rate limits, retries, and proxy rotation.







Can I scrape Instagram stories or reels data?Yes, both use dedicated GraphQL endpoints with their own doc\_id values. Stories return view counts, replies, and media URLs. Reels return play counts, audio attribution, and video duration (doc\_id `25981206651899035`, subject to change). Public accounts expose both without login.







How do I extract Instagram comments and engagement metrics?Comments live under `edge_media_to_parent_comment` in the post response. The first request returns ~12 comments and an `end_cursor`; paginate with that cursor to get the rest, and pull nested replies from each comment's `edge_threaded_comments`.

Engagement metrics are in the same post object: `edge_media_preview_like.count` for likes, `edge_media_to_parent_comment.count` for comment count, `video_view_count` for video views, and `video_play_count` for reel plays.







What are the most common Instagram scraping challenges?- **Doc\_id changes** (every 2-4 weeks) - monitor the [open-source scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper) for updates.
- **IP blocks** - datacenter IPs are banned instantly; use residential proxies with rotation.
- **TLS fingerprinting** - Python libraries have detectable signatures; use a tool like ScrapFly that rotates fingerprints.
- **Rate limits** (200 req/hour per IP) - rotate across residential IPs with sticky sessions.
- **Behavioral detection** - add random delays and mimic realistic browsing sequences.







Why does my Python Instagram scraper get 403 errors immediately?TLS fingerprinting. Python's `requests` and `httpx` have handshake signatures Instagram recognizes as bots before a single HTTP header is sent, so fixing headers does nothing. Your options are browser automation (real fingerprints, but slow), [curl\_cffi](https://scrapfly.io/blog/posts/curl-impersonate-scrape-chrome-firefox-tls-http2-fingerprint) (mimics Chrome's TLS), or ScrapFly (rotates TLS fingerprints automatically).







Is scraping Instagram data legal?US courts have generally allowed scraping public, logged-out data (Meta v. Bright Data, 2024, hiQ v. LinkedIn). Login-walled or private content is a different matter and violates Instagram's ToS. EU data collection triggers GDPR regardless of where you operate. This is not legal advice.









## Related Guides

- [Scrape TikTok](https://scrapfly.io/blog/posts/how-to-scrape-tiktok-python-json) - Extract profiles, videos, and comments from TikTok using similar anti-bot bypass techniques
- [Scrape Facebook Marketplace](https://scrapfly.io/blog/posts/how-to-scrape-facebook) - Navigate Facebook's authentication and anti-bot measures for marketplace and event data
- [Scrape X.com (Twitter)](https://scrapfly.io/blog/posts/how-to-scrape-twitter) - Access Twitter's GraphQL API for posts, profiles, and real-time data
- [Social Media Scraping Guide](https://scrapfly.io/blog/posts/social-media-scraping) - Comprehensive overview of scraping strategies across all major social platforms



## Summary

Instagram scraping in 2026 requires navigating complex blocking systems: IP quality detection, TLS fingerprinting, rate limiting, and behavioral analysis. Building a scraper from scratch means constant maintenance as Instagram updates doc\_ids every 2-4 weeks and evolves their blocking systems weekly.

We covered Instagram's multi-layered blocking system and why manual scrapers fail within hours, how to access hidden REST and GraphQL APIs for profiles, posts, and comments, why residential proxies are mandatory (datacenter IPs blocked instantly), how doc\_id parameters work and change every 2-4 weeks, and why Python libraries get blocked immediately due to TLS fingerprinting.

The smart approach: Start with [ScrapFly's working Instagram scraper](https://github.com/scrapfly/scrapfly-scrapers/tree/main/instagram-scraper) that includes anti-blocking bypass, residential proxies, and automatic updates when Instagram changes. This saves you hundreds of hours in maintenance and debugging.



Legal Disclaimer and PrecautionsThis tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect:

- Do not scrape at rates that could damage the website.
- Do not scrape data that's not available publicly.
- Do not store PII of EU citizens protected by GDPR.
- Do not repurpose *entire* public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow. For more you should consult a lawyer.

 

   Table of Contents















 

  Table of Contents- [Key Takeaways](#key-takeaways)
- [What Instagram Data Can You Scrape?](#what-instagram-data-can-you-scrape)
- [What You Can't Scrape from Instagram (2026 Limits)](#what-you-can-t-scrape-from-instagram-2026-limits)
- [How Instagram Blocks Scrapers (Anti-Bot Detection Explained)](#how-instagram-blocks-scrapers-anti-bot-detection-explained)
- [Instagram Scraper API vs Building Your Own: Which Should You Use?](#instagram-scraper-api-vs-building-your-own-which-should-you-use)
- [How to Scrape Instagram with ScrapFly (The Easy Way)](#how-to-scrape-instagram-with-scrapfly-the-easy-way)
- [How Instagram's Scraping API Works](#how-instagram-s-scraping-api-works)
- [How to Find Instagram Posts Without Native Search](#how-to-find-instagram-posts-without-native-search)
- [How to Scrape Instagram Profiles](#how-to-scrape-instagram-profiles)
- [How to Scrape Instagram Posts](#how-to-scrape-instagram-posts)
- [How to Scrape Instagram Comments](#how-to-scrape-instagram-comments)
- [How to Scrape Instagram with Proxies](#how-to-scrape-instagram-with-proxies)
- [FAQ](#faq)
- [Related Guides](#related-guides)
- [Summary](#summary)
 
    Join the Newsletter  Get monthly web scraping insights 

 

  



Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 

## Explore this Article with AI

 [ ChatGPT ](https://chat.openai.com/?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-instagram) [ Gemini ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-instagram) [ Grok ](https://x.com/i/grok?text=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-instagram) [ Perplexity ](https://www.perplexity.ai/search/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-instagram) [ Claude ](https://claude.ai/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-scrape-instagram) 



 ## Related Articles

 [  

 python data-parsing 

### Web Scraping Emails using Python

In this tutorial we'll take a look at email scraping. How to crawl pages and extract email addresses using Python and wh...

 

 ](https://scrapfly.io/blog/posts/how-to-scrape-emails-using-python) [  

 python hidden-api 

### How to Scrape YouTube in 2026

Learn how to scrape YouTube channel, video, comment, and Shorts data in Python using hidden APIs and yt-dlp. No API key ...

 

 ](https://scrapfly.io/blog/posts/how-to-scrape-youtube) [  

 http python 

### How to Effectively Use User Agents for Web Scraping

In this article, we’ll take a look at the User-Agent header, what it is and how to use it in web scraping. We'll also ge...

 

 ](https://scrapfly.io/blog/posts/user-agent-header-in-web-scraping) 

  



   



 Scale your web scraping effortlessly, **1,000 free credits** [Start Free](https://scrapfly.io/register)