🚀 We are hiring! See open positions

How to Scrape Instagram in 2025

How to Scrape Instagram in 2025

What if you could extract insights from millions of Instagram profiles, posts, and comments to understand market trends, analyze competitors, or generate leads? Instagram holds a goldmine of public data, but accessing it programmatically is deliberately difficult. Instagram employs sophisticated anti-bot defenses—TLS fingerprinting, IP quality detection, and behavioral analysis—designed to stop scrapers in their tracks. In this guide, we'll reveal exactly how Instagram blocks scrapers and why building a manual solution is a losing battle. More importantly, we'll show you the production-ready approach that bypasses every defense: ScrapFly's maintained Instagram scraper with built-in anti-bot infrastructure. You'll learn what data you can extract, how Instagram's hidden APIs work, and why the smart approach is starting with battle-tested code rather than building from scratch.

Key Takeaways

Master Instagram scraping in 2025 with production-ready solutions that bypass anti-bot defenses, access hidden GraphQL APIs, and scale reliably for business intelligence.

  • Understand Instagram's multi-layered anti-bot system: IP quality checks, TLS fingerprinting, rate limits, and behavioral detection
  • Access Instagram's hidden REST and GraphQL APIs to extract profiles, posts, comments, and engagement metrics
  • Use ScrapFly's open-source Instagram scraper with built-in anti-blocking to start scraping in 5 minutes
  • Implement proper proxy rotation with residential IPs to avoid instant datacenter IP blocks
  • Monitor and update doc_id parameters that Instagram changes every 2-4 weeks to break scrapers
  • Extract business intelligence: competitor analysis, sentiment tracking, influencer metrics, and lead generation data

Latest Instagram Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

What Instagram Data Can You Scrape?

Instagram's public data offers powerful business intelligence when extracted systematically. Here's what you can scrape and why it matters:

Profiles - Extract bio, follower/following counts, verification status, business contact info, and post statistics. Use case: Build targeted lead lists by scraping verified business profiles in specific niches, then reach out using their public email addresses.

Posts - Capture captions, images, videos, likes, view counts, timestamps, location tags, and tagged users. Use case: Analyze your competitor's top-performing content to understand what resonates with your shared audience and replicate successful formats.

Reels - Access video URLs, play counts, music attribution, duration, and engagement metrics. Use case: Track trending audio clips and formats in your industry to inform your own content strategy before trends peak.

Comments - Scrape comment text, nested replies, timestamps, author profiles, and like counts. Use case: Perform sentiment analysis on competitor posts to identify customer pain points and service gaps you can address.

Hashtags - Aggregate posts by hashtag, trending scores, and usage patterns. Use case: Discover emerging micro-influencers by scraping posts from industry hashtags and ranking by engagement rate rather than follower count.

But getting this data is a two-part challenge: finding the right API endpoints AND not getting blocked by Instagram's anti-bot defenses. Let's tackle both.

How Instagram Blocks Scrapers (Anti-Bot Detection Explained)

Instagram employs a sophisticated, multi-layered anti-bot system designed to identify and block automated scraping. Understanding these defenses reveals why manual scraping solutions fail and require constant maintenance.

Rate Limiting & IP Blocking

Instagram enforces strict request quotas to prevent aggressive scraping:

  • Request limits: ~200 requests per hour per IP address for non-authenticated users
  • Throttling response: After exceeding limits, you'll receive HTTP 429 "Too Many Requests" errors
  • Block duration: Your IP gets temporarily rate-limited for hours or days depending on violation severity
  • Progressive penalties: Repeated violations lead to longer blocks and eventually permanent IP bans

Even if you perfectly implement delays and respect rate limits, you're still limited to scraping ~4,800 profiles per day per IP—insufficient for any serious data collection.

IP Quality Detection

Instagram analyzes your IP address quality before even processing your request:

  • Datacenter IPs blocked instantly: Requests from AWS, DigitalOcean, Google Cloud, and other hosting providers are flagged immediately
  • Residential IPs required: Instagram expects requests from genuine consumer ISPs (Comcast, AT&T, etc.)
  • ASN reputation checking: Instagram maintains blocklists of ASNs (Autonomous System Numbers) associated with proxies and VPNs
  • This runs BEFORE rate limits: A datacenter IP gets blocked on the first request, regardless of how slowly you scrape

This is why you can't just deploy your scraper to a cloud server and expect it to work—Instagram blocks it before you even hit the rate limit.

Browser Fingerprinting

Instagram analyzes dozens of browser characteristics to detect automation tools:

  • TLS/SSL fingerprinting: Python's requests library has a unique TLS handshake signature that Instagram flags as a bot instantly
  • HTTP/2 fingerprinting: The order and format of HTTP/2 frames reveals whether you're using a real browser or a scripting library
  • Header order consistency: Real browsers send headers in a specific order; scrapers often randomize or alphabetize them
  • Canvas/WebGL fingerprinting: When JavaScript is enabled, Instagram tests how your browser renders graphics—automation frameworks produce consistent, detectable signatures

Even if you copy all the correct headers from a real browser, the TLS handshake alone will expose you as a bot within seconds.

Request Pattern Detection

Instagram's behavioral analysis identifies non-human usage patterns:

  • Timing patterns: Perfect 3-second delays between requests look robotic; humans vary their timing
  • Request sequencing: Real users navigate naturally (view profile → scroll → click post); bots often access API endpoints directly without realistic browsing
  • Session validation: Instagram expects correlated requests (CSS, images, analytics) alongside your API calls; scraping just the data endpoints is suspicious
  • Cookie behavior: Missing, malformed, or inconsistent cookies signal automation

Instagram's machine learning models are trained on millions of real user sessions—any deviation from natural human behavior raises red flags.

Diagram showing Instagram's four layers of anti-bot detection: Rate Limiting, IP Quality Detection, Browser Fingerprinting, and Request Pattern Detection
Instagram's Multi-Layered Anti-Bot Defense System

The Bottom Line: You can build a perfect scraper, but without professional anti-detection infrastructure, it'll get blocked within hours. Instagram updates these defenses weekly, meaning even working scrapers break constantly.

How to Scrape Instagram with ScrapFly (The Easy Way)

ScrapFly provides the complete Instagram scraping solution: production-ready scraper code + anti-blocking infrastructure. Clone the repository, configure your API key, and start scraping in 5 minutes.

What You Get

  • Production-ready scraper code: Open source, actively maintained, updated within hours when Instagram changes
  • Built-in anti-bot infrastructure: TLS fingerprinting, header rotation, and behavioral mimicry handled automatically
  • Residential proxy network included: 50M+ IPs from real consumer ISPs—no separate proxy bills or configuration
  • Automatic updates: When Instagram changes doc_ids or endpoints, we update the scraper immediately
  • Cost optimization: Proxy Saver feature reduces residential proxy costs by 30-50% through intelligent caching

Get Started in 5 Minutes →

# Clone the scraper repository
git clone https://github.com/scrapfly/scrapfly-scrapers.git
cd instagram-scraper

# Configure your ScrapFly API key
export SCRAPFLY_KEY="your_key_here"

# Install dependencies
poetry install

# Start scraping
poetry run python run.py

How ScrapFly Bypasses Every Defense

Anti-Bot Bypass: ScrapFly rotates TLS fingerprints to match real Chrome/Firefox browsers, orders HTTP headers correctly, and mimics genuine browser behavior. Instagram sees legitimate browser traffic, not a scraper.

Proxy Management: Our network of 50M+ residential IPs automatically rotates with each request. Instagram sees requests from real consumer devices across different ISPs and locations—exactly like genuine users.

Rate Limit Handling: Smart throttling and exponential backoff automatically slow down when Instagram pushes back. The scraper adjusts its speed dynamically to stay under the radar.

Proxy Saver: Reduces residential proxy costs by 30-50% by intelligently caching static content and only using premium residential IPs for the actual API calls. For a 10,000 profile scraping job, this saves $15-30 in proxy costs.

How Instagram's Scraping API Works

Instagram doesn't provide official public APIs, but their web and mobile apps communicate with backend APIs we can access directly. Instagram uses two API architectures:

  • REST API: Simple endpoints for basic data (e.g., /api/v1/users/web_profile_info/ for profiles)
  • GraphQL API: Complex query system for posts, comments, and paginated data

Instagram uses REST APIs for straightforward requests where the data structure is simple, and GraphQL for complex queries involving nested data, filtering, or pagination.

Finding Instagram's Hidden Endpoints

When Instagram updates their platform, endpoints change. Here's how to discover current endpoints when they break:

Step 1: Open Instagram in Chrome/Firefox and open DevTools (F12)

Step 2: Go to Network tab and filter by "Fetch/XHR" to see API calls

Step 3: Navigate Instagram normally (visit a profile, view a post, scroll comments)

Step 4: Watch for API requests to domains like:

  • i.instagram.com/api/v1/ (REST endpoints)
  • www.instagram.com/graphql/query (GraphQL endpoints)

Step 5: Click on an API request to inspect:

  • Request headers (especially x-ig-app-id)
  • Request payload (for GraphQL, look for variables and doc_id)
  • Response structure (to understand data format)

REST Example: When viewing a profile, you'll see a request to:

https://i.instagram.com/api/v1/users/web_profile_info/?username=google

GraphQL Example: When viewing a post, you'll see a POST request to:

https://www.instagram.com/graphql/query

With a payload containing doc_id and variables parameters.

Understanding doc_id

The doc_id parameter is critical for GraphQL scraping but poorly understood. Here's what you need to know:

What is doc_id?

  • Instagram's internal identifier for specific GraphQL query structures
  • Maps to a predefined query on Instagram's backend (you can't define custom queries)
  • Example: doc_id=8845758582119845 retrieves post details

Why doc_ids exist:

  • Performance: Pre-defined queries are optimized and cached on Instagram's servers
  • Security: Prevents custom queries that could overload the database
  • Anti-scraping: Changing doc_ids regularly breaks scrapers

Why doc_ids change:

  • Instagram updates their GraphQL schema every 2-4 weeks
  • Changes are a deliberate anti-scraping measure
  • No public documentation of current values—you must discover them yourself

How to find current doc_ids:

  1. Open DevTools → Network tab → filter for "graphql"
  2. Trigger the action on Instagram (view post, load comments, etc.)
  3. Inspect the request payload for doc_id= parameter
  4. Note the numeric value (e.g., 8845758582119845)

Different operations require different doc_ids:

  • Profile posts: 9310670392322965
  • Post details: 8845758582119845
  • Comments pagination: (changes frequently)
  • User search: (changes frequently)

The DIY Pain: You must monitor doc_ids manually and update your scraper every time Instagram changes them (every 2-4 weeks). Miss an update and your scraper breaks silently.

ScrapFly Solution: Our open-source repository is updated within hours of Instagram changes. You pull the latest code and keep scraping—no detective work required.

Required Headers

Headers aren't just formalities—Instagram validates them strictly. Here's what you need and why:

Critical Headers:

{
    "x-ig-app-id": "936619743392459",  # Instagram web app identifier
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "*/*",
}

Why each header matters:

  • x-ig-app-id: Identifies your request as coming from Instagram's web app (not mobile app or unauthorized client). Wrong value = instant 403 error.
  • User-Agent: Must match a real browser signature. Python's default User-Agent screams "bot" and gets blocked immediately.
  • Accept-Language: Instagram tracks inconsistent language preferences across requests—keep it stable per session.
  • Accept-Encoding: Real browsers always accept compression; omitting this is suspicious.
  • Accept: Wildcard is fine, but must be present.

What happens with wrong headers:

  • 403 Forbidden: TLS fingerprint or app-id mismatch detected
  • 400 Bad Request: Malformed headers or missing required fields
  • No response: Your IP was flagged and silently dropped

Header consistency requirement:
Instagram correlates requests within a session. If your User-Agent changes mid-session or headers conflict with your TLS fingerprint, you're flagged instantly.

How to Scrape Instagram Profiles

Instagram profiles contain valuable business intelligence: follower counts, bio information, business contact details, and recent posts. We'll use Instagram's REST API endpoint that returns profile data as JSON.

What you can extract:

  • Full name, username, user ID, verification status
  • Bio text and external links
  • Follower and following counts
  • Profile picture URL
  • Business category, phone number, email (for business accounts)
  • First 12 posts with preview data

The approach:
We make a GET request to Instagram's profile API endpoint with the username as a parameter. The response includes comprehensive profile data in JSON format.

ScrapFly's scraper handles:

  • Proper header formatting and x-ig-app-id rotation
  • Residential proxy rotation to avoid IP blocks
  • TLS fingerprint matching to bypass bot detection
  • Automatic retry with exponential backoff on rate limits

Code snippet from the ScrapFly scraper:

from scrapfly import ScrapeConfig, ScrapflyClient
import json

scrapfly = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")

INSTAGRAM_APP_ID = "936619743392459"
BASE_CONFIG = {
    "asp": True,  # Anti Scraping Protection bypass
    "country": "US",  # Use US residential proxies
}

async def scrape_profile(username: str):
    """Scrape Instagram profile data"""
    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url=f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}",
            headers={"x-ig-app-id": INSTAGRAM_APP_ID},
            **BASE_CONFIG,
        )
    )
    data = json.loads(result.content)
    return data["data"]["user"]

# Example usage
profile = await scrape_profile("google")
print(f"Followers: {profile['edge_followed_by']['count']}")

Key implementation details:

  • The asp=True parameter activates ScrapFly's anti-bot bypass (TLS fingerprinting, header rotation)
  • Residential proxies (country="US") prevent datacenter IP blocks
  • The endpoint returns up to 12 recent posts embedded in the profile response
  • Business accounts expose email/phone in business_email and business_phone_number fields

Full profile scraper code in our repository →

How to Scrape Instagram Posts

Post data includes captions, media URLs, engagement metrics, comments, and tagged users. Instagram uses GraphQL for post queries, requiring proper doc_id values and request formatting.

What you can extract:

  • Post shortcode, ID, and timestamp
  • Image/video URLs (full resolution)
  • Captions and hashtags
  • Like counts, view counts (for videos), play counts (for reels)
  • First page of comments (with pagination cursor for more)
  • Tagged users and location data
  • Related posts

The approach:
We send a POST request to Instagram's GraphQL endpoint with a payload containing the post shortcode and the correct doc_id. Instagram returns comprehensive post data including engagement metrics and comments.

ScrapFly's scraper handles:

  • Current doc_id values (updated within hours when Instagram changes them)
  • GraphQL payload formatting and URL encoding
  • Comment pagination for posts with 100+ comments
  • Different post types: photos, videos, reels, carousels

Code snippet from the ScrapFly scraper:

from scrapfly import ScrapeConfig, ScrapflyClient
import json
from urllib.parse import quote

scrapfly = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")

INSTAGRAM_POST_DOC_ID = "8845758582119845"  # Updated regularly
BASE_CONFIG = {"asp": True, "country": "US"}

async def scrape_post(url_or_shortcode: str):
    """Scrape single Instagram post data"""
    # Extract shortcode from URL or use directly
    if "http" in url_or_shortcode:
        shortcode = url_or_shortcode.split("/p/")[-1].split("/")[0]
    else:
        shortcode = url_or_shortcode

    # Build GraphQL request payload
    variables = quote(json.dumps({
        'shortcode': shortcode,
        'fetch_tagged_user_count': None,
        'hoisted_comment_id': None,
        'hoisted_reply_id': None
    }, separators=(',', ':')))

    body = f"variables={variables}&doc_id={INSTAGRAM_POST_DOC_ID}"

    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url="https://www.instagram.com/graphql/query",
            method="POST",
            body=body,
            headers={"content-type": "application/x-www-form-urlencoded"},
            **BASE_CONFIG,
        )
    )

    data = json.loads(result.content)
    return data["data"]["xdt_shortcode_media"]

# Example usage
post = await scrape_post("https://www.instagram.com/p/CuE2WNQs6vH/")
print(f"Likes: {post['edge_media_preview_like']['count']}")

Key implementation details:

  • The shortcode is the unique post identifier (e.g., CuE2WNQs6vH from URL /p/CuE2WNQs6vH/)
  • GraphQL requires URL-encoded JSON in the request body
  • The response includes nested structures for comments (edge_media_to_parent_comment)
  • Carousel posts have multiple images in edge_sidecar_to_children

Full post scraper code with pagination →

How to Scrape Instagram Comments

Comments provide sentiment data, user engagement patterns, and conversation threads. Instagram paginates comments, requiring multiple requests to extract full comment sections.

What you can extract:

  • Comment text and timestamp
  • Commenter username, profile, verification status
  • Like counts per comment
  • Nested replies (threaded conversations)
  • Pagination cursors for loading more comments

The approach:
Comments are included in the initial post data (first ~12 comments), but posts with hundreds of comments require pagination. We use the end_cursor value from page_info to load subsequent pages through additional GraphQL requests.

ScrapFly's scraper handles:

  • Nested pagination (comments and their replies have separate cursors)
  • Rate limit respect (Instagram throttles aggressive comment scraping)
  • Proper doc_id for comment pagination queries
  • Reply threading and parent-child comment relationships

Code snippet for comment pagination:

async def scrape_post_comments(shortcode: str, max_comments: int = 100):
    """Scrape comments from Instagram post with pagination"""
    comments = []
    cursor = None

    while len(comments) < max_comments:
        variables = quote(json.dumps({
            'shortcode': shortcode,
            'first': 50,  # Comments per page
            'after': cursor,  # Pagination cursor
        }, separators=(',', ':')))

        body = f"variables={variables}&doc_id={INSTAGRAM_COMMENTS_DOC_ID}"

        result = await scrapfly.async_scrape(
            ScrapeConfig(
                url="https://www.instagram.com/graphql/query",
                method="POST",
                body=body,
                headers={"content-type": "application/x-www-form-urlencoded"},
                **BASE_CONFIG,
            )
        )

        data = json.loads(result.content)
        comment_data = data["data"]["xdt_shortcode_media"]["edge_media_to_parent_comment"]

        # Extract comments from this page
        for edge in comment_data["edges"]:
            comments.append(edge["node"])

        # Check for more comments
        page_info = comment_data["page_info"]
        if not page_info["has_next_page"]:
            break

        cursor = page_info["end_cursor"]

    return comments[:max_comments]

Key implementation details:

  • The first parameter controls comments per page (max ~50)
  • Each comment includes edge_threaded_comments for nested replies
  • Replies have their own pagination system requiring separate requests
  • The scraper respects Instagram's rate limits by adding delays between pagination requests

Full comment scraper with reply handling →

How to Scrape Instagram with Proxies

Proxies are mandatory for Instagram scraping at any scale. Instagram's IP quality detection blocks datacenter IPs instantly, and rate limits force you to rotate residential IPs to maintain scraping speed.

Best Proxies for Instagram Scraping (Residential vs Datacenter)

Datacenter Proxies: Don't Bother

  • ❌ Blocked instantly by Instagram's IP quality checks
  • ❌ No request volume possible—banned on first request
  • ❌ Cheaper per GB, but 100% failure rate makes cost irrelevant

Residential Proxies: Required

  • ✅ IPs from real consumer ISPs (Comcast, Verizon, AT&T, etc.)
  • ✅ Pass Instagram's IP quality detection
  • ✅ Each IP allows ~200 requests/hour before rate limiting
  • ✅ Geographic targeting (e.g., US-only IPs for US-focused scraping)

Mobile Proxies: Premium Option

  • ✅ IPs from mobile carriers (4G/5G networks)
  • ✅ Highest trust score—Instagram rarely blocks mobile IPs
  • ✅ Better rate limits (~300 requests/hour per IP)
  • ❌ More expensive ($60-120/month per IP vs $1-3 for residential)

Recommendation: Residential proxies are the sweet spot for Instagram scraping. Mobile proxies offer marginal improvement at 10-20x the cost—not worth it unless you're scraping millions of profiles daily.

How to Rotate Proxies for Instagram Scraping

Proxy rotation strategies determine your scraping speed and block rate:

Sticky Sessions (Recommended):

  • Use the same IP for 5-10 minutes, then rotate
  • Mimics real user behavior (one person doesn't change IPs every 10 seconds)
  • Allows ~15-30 requests per IP before rotation
  • Instagram's behavioral analysis flags instant IP changes as suspicious

Request-Level Rotation (Aggressive):

  • New IP for every single request
  • Maximizes speed but looks unnatural to Instagram
  • Higher block rate—use only with anti-bot bypass (like ScrapFly)
  • Necessary when scraping 10,000+ profiles/hour

Smart Rotation Based on Response:

  • Rotate immediately on 429 (rate limit) or 403 (block)
  • Continue using same IP while responses are 200 OK
  • Implements exponential backoff: 2s delay → 4s → 8s → 16s before rotating
  • Reduces wasted proxy bandwidth

ScrapFly's automatic proxy management:

  • Intelligent rotation using sticky sessions by default
  • Instant rotation on rate limits or blocks
  • Geographic pinning (keep requests in same country/region)
  • Proxy pool health monitoring (removes dead IPs automatically)

Instagram Proxy Costs

Residential proxies are billed per GB of bandwidth consumed. Here's what Instagram scraping costs:

Data usage per request type:

  • Profile scrape: ~50-100 KB per profile
  • Post scrape: ~30-80 KB per post
  • Comment scrape: ~20-50 KB per comment page

Example scraping job: 10,000 Instagram profiles

  • 10,000 profiles × 75 KB average = 750 MB
  • Standard residential proxy cost: $10-15 per GB
  • Total cost: $7.50-11.25 in proxy bandwidth

ScrapFly's Proxy Saver feature:

  • Caches static content (profile images, CSS, JavaScript)
  • Only uses residential bandwidth for actual API calls
  • Reduces bandwidth consumption by 30-50%
  • Same 10,000 profile job: $5.25-7.88 with Proxy Saver
  • Savings: $2.25-3.37 per 10K profiles (30-40% reduction)

For serious Instagram scraping (100K+ profiles/month), Proxy Saver saves $50-100+ monthly in proxy costs alone.

FAQ

Are there public APIs for Instagram?

No. Instagram discontinued public API access in 2020 and now only offers limited APIs for verified business partners through the Instagram Graph API. However, Instagram's web and mobile apps communicate with internal REST and GraphQL APIs that we can access directly through reverse engineering. These "hidden" APIs provide far more data than the official API ever did.

Yes, scraping publicly accessible Instagram data is generally legal under current precedent (hiQ Labs v. LinkedIn, 2022). Courts have ruled that accessing public data doesn't violate the Computer Fraud and Abuse Act. However, you must:

  • Only scrape public data (no accessing private accounts)
  • Respect rate limits and don't overload Instagram's servers
  • Comply with GDPR, CCPA, and other data privacy laws when handling personal information
  • Not violate Instagram's Terms of Service (which prohibit scraping, but isn't typically illegal)

For more details, see our web scraping legality guide.

How to get Instagram user ID from username?

Scrape the user profile using the /api/v1/users/web_profile_info/ endpoint and extract the id field:

import httpx
import json

response = httpx.get(
    f"https://i.instagram.com/api/v1/users/web_profile_info/?username=google",
    headers={"x-ig-app-id": "936619743392459"}
)
data = json.loads(response.content)
user_id = data["data"]["user"]["id"]
print(user_id)  # Output: 1067259270

How to get Instagram username from user ID?

Use Instagram's public mobile API endpoint:

import httpx

iphone_api = "https://i.instagram.com/api/v1/users/{}/info/"
iphone_user_agent = "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Mobile/14G60 Instagram 12.0.0.16.90"

response = httpx.get(
    iphone_api.format("1067259270"),
    headers={"User-Agent": iphone_user_agent}
)
username = response.json()['user']['username']
print(username)  # Output: google

How do I handle Instagram's rate limiting when scraping at scale?

Instagram rate limiting requires a three-part strategy:

  1. Residential proxy rotation: Use 50-100+ residential IPs and rotate them in sticky sessions (5-10 minutes per IP). Each IP allows ~200 requests/hour.

  2. Realistic delays: Space requests 2-5 seconds apart with random variance. Perfect timing intervals look robotic.

  3. Exponential backoff: When you receive a 429 error, back off exponentially (wait 2s, then 4s, then 8s, etc.) before retrying.

ScrapFly handles all three automatically—you specify your desired scraping speed and we manage rate limits, retries, and proxy rotation.

What's the difference between scraping Instagram profiles vs posts?

Profiles use a simple REST API endpoint (/api/v1/users/web_profile_info/) that returns JSON with a single GET request. The response includes profile data plus the first 12 posts.

Posts use Instagram's GraphQL API (/graphql/query) requiring POST requests with specific doc_id values and URL-encoded JSON payloads. The response structure is more complex with nested data for comments, likes, and tagged users.

Pagination: Profile data is single-page, while post comment scraping requires pagination through multiple requests using cursor values.

Can I scrape Instagram stories or reels data?

Yes, stories and reels use dedicated GraphQL endpoints with their own doc_id values:

Stories: Ephemeral (24-hour) content requires the user's ID and a stories-specific doc_id. Stories include view counts, replies, and media URLs.

Reels: Similar to post scraping but with video-specific fields (play counts, audio attribution, video duration). Reels use doc_id 25981206651899035 (subject to change).

Both require authentication for some accounts, but public accounts expose this data without login.

How do I extract Instagram comments and engagement metrics?

Comments are nested in post data under edge_media_to_parent_comment. The initial post request returns the first ~12 comments plus a pagination cursor (end_cursor).

To extract all comments:

  1. Scrape the post to get initial comments and end_cursor
  2. Make paginated GraphQL requests with the cursor to load more
  3. Extract nested replies from each comment's edge_threaded_comments

Engagement metrics are in the post data:

  • Likes: edge_media_preview_like.count
  • Comments: edge_media_to_parent_comment.count
  • Views (videos): video_view_count
  • Plays (reels): video_play_count

What are the most common Instagram scraping challenges?

1. Doc_id changes: Instagram updates GraphQL doc_id values every 2-4 weeks, breaking scrapers. Solution: Monitor our open-source scraper for updates.

2. IP blocks: Datacenter IPs are banned instantly. Solution: Use residential proxies with rotation.

3. TLS fingerprinting: Python libraries have detectable signatures. Solution: Use anti-bot bypass tools like ScrapFly that rotate fingerprints.

4. Rate limits: 200 requests/hour per IP. Solution: Rotate through 50+ residential IPs with sticky sessions.

5. Behavioral detection: Unnatural request patterns get flagged. Solution: Add random delays, mimic real browsing sequences.

Why does my Python Instagram scraper get 403 errors immediately?

This is TLS fingerprinting blocking. Python's requests and httpx libraries have unique TLS handshake signatures that Instagram detects as bots within the first request.

Solutions:

  1. Use browser automation (Selenium, Playwright) which has real browser fingerprints but it's 10x slower
  2. Use curl_cffi library which mimics Chrome's TLS fingerprint
  3. Use ScrapFly which rotates TLS fingerprints automatically

Don't waste time trying to fix headers—the TLS handshake happens before HTTP headers are even sent. You need a tool that controls the TLS layer.

Latest Instagram Scraper Code
https://github.com/scrapfly/scrapfly-scrapers/

Summary

Instagram scraping in 2025 requires navigating sophisticated anti-bot defenses: IP quality detection, TLS fingerprinting, rate limiting, and behavioral analysis. Building a scraper from scratch means constant maintenance as Instagram updates doc_ids every 2-4 weeks and evolves their blocking systems weekly.

We covered Instagram's multi-layered anti-bot system and why manual scrapers fail within hours, how to access hidden REST and GraphQL APIs for profiles, posts, and comments, why residential proxies are mandatory (datacenter IPs blocked instantly), how doc_id parameters work and change every 2-4 weeks, and why Python libraries get blocked immediately due to TLS fingerprinting.

The smart approach: Start with ScrapFly's production-ready Instagram scraper that includes anti-bot bypass, residential proxies, and automatic updates when Instagram changes—saving you hundreds of hours in maintenance and debugging.

Explore this Article with AI

Related Knowledgebase

Related Articles