How to Scrape X.com (Twitter) in 2025

by Bernardas Ališauskas Sep 26, 2025

#scrapeguide #python

X.com (formerly Twitter) killed its free API in 2023. Since then, the platform has been rolling out defensive changes every 2-4 weeks that break DIY scrapers. Guest tokens expire, doc_ids rotate, and rate limits shift. If you're building a scraper from scratch today, you're signing up for 10-15 hours of monthly maintenance just to keep it working.

This guide shows you exactly what breaks and why, then introduces ScrapFly's maintained scraper as the only practical solution if you need X.com data at scale.

Key Takeaways

X.com's free API is gone; the paid version costs $42,000/year minimum
Guest tokens, doc_ids, and IP blocks break scrapers every 2-4 weeks
Manual scraper maintenance demands 10-15 hours per month to stay functional
ScrapFly monitors changes automatically and updates its scraper within 24 hours
Get started by cloning the working scraper and adding your API key

Latest X.com Scraper Code

Clone and run: https://github.com/scrapfly/scrapfly-scrapers/

Why Scrape X.com?

X.com remains valuable for:

Real-time News: Monitor breaking stories and trending topics before they spread elsewhere
Market Signals: Track sentiment and announcements in finance and crypto communities
Brand Monitoring: See what people say about your product or company in real time
Competitor Research: Watch what competitors post and how audiences engage with their content

The data is publicly available and valuable. The problem is just getting it consistently.

For similar challenges with other platforms, see Instagram scraping, Reddit scraping, and YouTube scraping.

The X.com Scraping Problem

API Is Dead

In February 2023, X.com shut down free API access. The official paid API now starts at $42,000 per year for basic access to 100 tweets.

For anyone scraping at scale, the paid API is a non-starter. This forced developers to find alternatives.

Breaking Changes Timeline (2023-2025)

X.com has released significant defensive changes roughly every 2-4 weeks:

February 2023: Free API access ends. Paid tiers introduced with strict cost barriers
March 2023: New API rate limits severely restricted free tier usage; many apps stopped working
June 2023: Guest token acquisition methods changed; existing scrapers broke
August 2023: Rate limits reduced from 450 to 300 requests per hour; datacenter IP blocking increased
November 2023: GraphQL endpoint changes required doc_id updates across all query types
January 2024: Guest token format and expiration timing changed; TLS fingerprinting detection tightened
April 2024: doc_ids rotated again; anti-scraping headers added to responses
July 2024: Cookie validation requirements changed; session handling became more strict
October 2024: IP reputation scoring tightened; rotating proxies now flagged earlier
January 2025: Guest token binding to browser fingerprints implemented; datacenter IPs permanently banned

This isn't a stable target.

Three Things That Break Constantly

1. Guest Tokens

Every API call to X.com's GraphQL backend requires a guest token. These tokens:

Expire every 2-4 hours
Are tied to your IP address
Have acquisition methods that change every few weeks
Require new reverse engineering each time X.com shifts its approach

When a guest token expires, your scraper stops cold. Acquiring a new one becomes a game of catching up with X.com's latest obfuscation. See handling cookies and session management for related concepts.

2. doc_ids

X.com's GraphQL queries use doc_ids, which are identifiers that tell the backend which operation to execute. These:

Rotate every 2-4 weeks
Require tracking 8-12 different IDs simultaneously
Need reverse engineering from X.com's frontend JavaScript
Have no public documentation

Without current doc_ids, your queries fail silently or return empty results. Similar blocking techniques are documented in detecting anti-bot systems.

3. Rate Limits & Blocking

X.com enforces:

300 requests per hour per IP address
Instant blocking of datacenter IPs (detected within 1-2 requests)
Increasing TLS fingerprinting checks to detect browser automation
Cookie validation that flags rotating proxy behavior

Even with all the tokens and doc_ids correct, you'll hit blocks faster than you think. For handling rate limits, see rate limiting async requests and avoiding IP blocking.

The Maintenance Reality

Running a DIY X.com scraper demands:

Monitoring: Watch X.com's API responses for failures
Reverse Engineering: Extract new doc_ids when they rotate
Guest Token Fixes: Update acquisition logic when X.com changes endpoints
Rate Limit Handling: Implement backoff strategies that actually work
Proxy Rotation: Maintain residential proxy pools and rotation logic

This typically requires 10-15 hours per month just to keep a scraper running. For teams with only one engineer, it becomes a permanent side job.

Public View Limitations

Scraping without authentication gives you limited data. You can get:

Public tweets and profiles
Follower counts and basic engagement metrics

You cannot get:

Private/protected account data
Detailed search results (limited without login)
User timelines (incomplete without authentication)
Bookmarks, lists, or other account-specific data

This is why many scrapers require login, which introduces its own set of problems (account suspension risk, session management complexity).

ScrapFly's Solution

Instead of maintaining a scraper, use one that's maintained for you.

Clone the working scraper:

$ git clone https://github.com/scrapfly/scrapfly-scrapers.git
$ cd scrapfly-scrapers/twitter-scraper
$ poetry install
$ poetry run python run.py

ScrapFly's scraper:

Updates within 24 hours of X.com changes
Handles guest tokens automatically (acquires and rotates without your code)
Includes residential proxies (built-in IP rotation)
Tracks doc_ids automatically (reverse-engineers and updates them behind the scenes)
Bypasses anti-bot measures (realistic browser fingerprints and request patterns)
Supports all public clients (profiles, tweets, search, threads, engagement)

You clone it, add your API key, and it works. When X.com changes, ScrapFly updates the code. You don't maintain anything.

What Data You Can Scrape

Profiles: Username, bio, follower count, verification status, profile picture URL.

Tweets: Text content, timestamps, media URLs, engagement counts (likes, retweets, replies).

Search Results: Tweets matching your query, ranked by relevance and recency.

Threads: Replies, quote tweets, and conversation chains connected to a parent tweet.

Engagement Metrics: Likes, retweets, replies, quotes, and bookmark counts.

Followers: User lists following a given account (where publicly available).

How X.com's API Works

X.com is a React application that loads minimal HTML, then uses JavaScript to fetch data via GraphQL queries.

Here's the flow:

Load X.com page
JavaScript initializes and requests a guest token
Guest token is returned (valid for 2-4 hours)
Page JavaScript makes GraphQL queries using the token
Queries include doc_ids to identify which operation to execute
Backend returns JSON data, frontend renders it

Without a guest token, you can't make queries. Without the right doc_id, your query doesn't match any backend operation. Without working residential proxies and rate limiting, X.com blocks your IP.

A scraper needs to replicate this flow: get token, craft queries with current doc_ids, handle rate limits, and rotate IPs intelligently.

Guest Tokens Explained

What they are: Temporary credentials that prove you're a user (not a bot).

Why they expire: X.com limits token lifetime to prevent token reuse and API abuse.

How they're tied to IP: X.com validates that requests using a token come from the same IP that requested it. Rotating IPs breaks token validity.

How ScrapFly handles them: Automatically acquires new tokens, maintains them per IP session, and rotates them before expiration. Your code simply calls the scraper while token management happens behind the scenes.

Code example (ScrapFly handles this internally):

from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR_API_KEY")
result = client.scrape(ScrapeConfig(
    "https://x.com/Scrapfly_dev",
    render_js=True,  # Enable JavaScript rendering
    asp=True,        # Enable anti-scraping bypass
))

ScrapFly's SDK manages guest tokens, IP sessions, and retries automatically. You provide the URL; it handles the rest.

doc_ids Explained

What they are: Unique identifiers for GraphQL operations. Each query type (fetch profile, get tweets, search) has its own doc_id.

Why they change: X.com rotates doc_ids every 2-4 weeks to break reverse-engineered scrapers, and there's no pattern since they're essentially random identifiers.

How many are active: You typically need to track 8-12 doc_ids simultaneously:

User info queries
Tweet detail queries
Search queries
Timeline queries
Engagement queries

How to find them: Reverse-engineer X.com's JavaScript bundle, intercept GraphQL calls with browser dev tools, or monitor X.com's API patterns. It's manual work, and it repeats every few weeks.

ScrapFly's approach: Monitors X.com automatically and updates doc_ids within 24 hours of rotation. When you clone the scraper, all current doc_ids are already in place.

Scraping Profiles

Extract user profile information using ScrapFly:

from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR_API_KEY")

result = await client.async_scrape(ScrapeConfig(
    "https://x.com/Scrapfly_dev",
    render_js=True,
    asp=True,
    wait_for_selector="[data-testid='primaryColumn']"
))

# Capture XHR calls and extract profile data
xhr_calls = result.scrape_result["browser_data"]["xhr_call"]
user_calls = [f for f in xhr_calls if "UserBy" in f["url"]]

for xhr in user_calls:
    data = json.loads(xhr["response"]["body"])
    profile = parse_profile(data["data"]["user"]["result"])
    print(json.dumps(profile, indent=2))

What you get:

Display name and username
Bio and description
Follower/following counts
Verification status
Media count and listed count
Profile banner and creation date
Website URL and location

Example Output

{
  "id": "VXNlcjoxMzEwNjIzMDgxMzAwNDAyMTc4",
  "rest_id": "1310623081300402178",
  "verified": true,
  "default_profile": true,
  "default_profile_image": false,
  "description": "API products for developers:\n- Web Scraping API: scrape any page\n- Screenshot API: screenshot any website\n- Extraction API: parse data using AI",
  "entities": {
    "description": {
      "urls": []
    },
    "url": {
      "urls": [
        {
          "display_url": "scrapfly.io",
          "expanded_url": "https://scrapfly.io",
          "url": "https://t.co/1Is3k6KzyM",
          "indices": [0, 23]
        }
      ]
    }
  },
  "fast_followers_count": 0,
  "favourites_count": 41,
  "followers_count": 281,
  "friends_count": 5,
  "has_custom_timelines": true,
  "is_translator": false,
  "listed_count": 3,
  "media_count": 38,
  "normal_followers_count": 281,
  "pinned_tweet_ids_str": ["1863616315174551787"],
  "possibly_sensitive": false,
  "profile_banner_url": "https://pbs.twimg.com/profile_banners/1310623081300402178/1601320645",
  "profile_interstitial_type": "",
  "statuses_count": 186,
  "translator_type": "none",
  "url": "https://t.co/1Is3k6KzyM",
  "withheld_in_countries": []
}

Link to full implementation: ScrapFly X.com Scraper GitHub

Scraping Tweets

Extract tweets and their metadata:

from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR_API_KEY")

result = await client.async_scrape(ScrapeConfig(
    "https://x.com/Scrapfly_dev/status/1664267318053179398",
    render_js=True,
    asp=True,
    wait_for_selector="[data-testid='tweet']"
))

# Capture XHR calls and extract tweet data
xhr_calls = result.scrape_result["browser_data"]["xhr_call"]
tweet_calls = [f for f in xhr_calls if "TweetResultByRestId" in f["url"]]

for xhr in tweet_calls:
    data = json.loads(xhr["response"]["body"])
    tweet = parse_tweet(data['data']['tweetResult']['result'])
    print(json.dumps(tweet, indent=2))

What you get:

Tweet text and full content
Creation timestamp
Like, retweet, reply, and quote counts
View count
Media URLs (images, videos)
Attached URLs with expanded links
Tagged hashtags and mentioned users
Author information (nested user object)
Engagement data (bookmarks, conversation ID)

Example Output

{
  "created_at": "Thu Jun 01 13:47:03 +0000 2023",
  "attached_urls": [
    "https://scrapfly.io/blog/top-10-web-scraping-libraries-in-python/"
  ],
  "attached_media": [
    "https://pbs.twimg.com/media/FxiqTffWIAALf7O.png"
  ],
  "tagged_users": [],
  "tagged_hashtags": [],
  "favorite_count": 8,
  "bookmark_count": 1,
  "quote_count": 0,
  "reply_count": 7,
  "retweet_count": 1,
  "text": "A new blog post has been published! \n\nTop 10 Web Scraping Packages for Python 🤖\n\nCheckout it out 👇\nhttps://t.co/d2iFdAV2LJ https://t.co/zLjDlxdKee",
  "is_quote": false,
  "is_retweet": false,
  "language": "en",
  "user_id": "1310623081300402178",
  "id": "1664267318053179398",
  "conversation_id": "1664267318053179398",
  "source": "https://zapier.com/",
  "views": "2296",
  "poll": {},
  "user": {
    "id": "VXNlcjoxMzEwNjIzMDgxMzAwNDAyMTc4",
    "rest_id": "1310623081300402178",
    "verified": true,
    "default_profile": true,
    "default_profile_image": false,
    "description": "API products for developers:\n- Web Scraping API: scrape any page\n- Screenshot API: screenshot any website\n- Extraction API: parse data using AI",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {
        "urls": [
          {
            "display_url": "scrapfly.io",
            "expanded_url": "https://scrapfly.io",
            "url": "https://t.co/1Is3k6KzyM",
            "indices": [0, 23]
          }
        ]
      }
    },
    "fast_followers_count": 0,
    "favourites_count": 41,
    "followers_count": 281,
    "friends_count": 5,
    "has_custom_timelines": true,
    "is_translator": false,
    "listed_count": 3,
    "media_count": 38,
    "normal_followers_count": 281,
    "pinned_tweet_ids_str": ["1863616315174551787"],
    "possibly_sensitive": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/1310623081300402178/1601320645",
    "profile_interstitial_type": "",
    "statuses_count": 186,
    "translator_type": "none",
    "url": "https://t.co/1Is3k6KzyM",
    "withheld_in_countries": []
  }
}

Link to full implementation: ScrapFly X.com Scraper GitHub

Proxies

Why You Need Them

Scraping X.com without proxies will fail in minutes:

Rate limit: 300 requests per hour per IP
IP blocking: Datacenter IPs are blocked instantly (within 1-2 requests)
Detection: X.com tracks request patterns and flags suspicious behavior

You cannot scrape X.com at any meaningful scale using a single datacenter IP. For more context, see proxy introduction.

Best Type: Residential Proxies

Why residential: Requests come from real residential IP addresses, making them indistinguishable from regular users. X.com's detection systems accept them.

Cost: $1-3 per gigabyte of traffic.

Rotation strategy: Use sticky sessions that last 10-15 minutes. This keeps the guest token and IP session stable while giving you enough coverage to avoid triggering rate limits on a single IP. For more on rotation, see proxy rotation techniques.

Maintenance: ScrapFly includes residential proxies in its service, so you don't manage pools or rotation logic since it's handled for you. Check best proxy providers and advanced proxy optimization for alternatives.

Cost Example

Scraping 10,000 tweets:

API calls: ~50-100 requests (1-2 per tweet for detail fetching)
Data transferred: 5-10 MB
Proxy cost: $5-8 at standard residential rates

For price-tracking or sentiment-analysis projects, this is reasonable. For continuous monitoring, ScrapFly's monthly plan is more cost-effective than managing your own infrastructure. See top residential proxy providers for comparison.

FAQ

How often does X.com change its defenses?

Every 2-4 weeks, X.com rolls out changes to guest tokens, doc_ids, rate limits, or detection patterns, and there's no predictable schedule because changes happen whenever they happen.

Why can't I just use a datacenter proxy?

X.com blocks datacenter IPs on sight. They detect datacenter IP ranges and reject requests immediately. Even with perfect tokens and doc_ids, a datacenter proxy will fail within seconds.

How does ScrapFly know when X.com changes?

ScrapFly monitors X.com's API continuously, tests scraper functionality hourly, and detects failures before customers do. When failures are detected, engineers reverse-engineer the new changes and push updates within 24 hours.

Can I scrape X.com without authentication?

Yes, public tweets and profiles are scrappable without login. However, you'll have limited access to search, timelines, and engagement data. For full functionality, authenticated sessions are necessary, though this carries account suspension risk. Related alternatives: Playwright automation or Selenium automation.

Is it legal to scrape X.com?

Scraping public data is generally legal. However, X.com's terms of service prohibit automated access without permission. Legally, it's a gray area. Use scraped data responsibly and consult legal counsel for your specific use case. For other platforms, see social media scraping in 2025.

How much does ScrapFly cost?

Pricing depends on data volume and features. Visit ScrapFly's pricing page for current rates. Most projects start at $20-100/month depending on request volume.

X.com Scraping Summary

The free API is gone. Manual scraping breaks every 2-4 weeks due to guest tokens expiring, doc_ids rotating, and rate limits shifting. Maintaining a DIY scraper costs 10-15 hours per month.

ScrapFly solves this by maintaining one scraper for you. Guest tokens are handled automatically. doc_ids are tracked and updated within 24 hours of rotation. Residential proxies are included. Rate limiting and anti-bot bypass are built in.

Clone the repo, add your API key, start scraping. When X.com changes, ScrapFly updates the code. You don't maintain anything.

Latest X.com Scraper Code

Ready to use: https://github.com/scrapfly/scrapfly-scrapers/

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Anti-bot protection bypass - scrape web pages without blocking!
Rotating residential proxies - prevent IP address and geographic blocks.
JavaScript rendering - scrape dynamic web pages through cloud browsers.
Full browser automation - control browsers to scroll, input and click on objects.
Format conversion - scrape as HTML, JSON, Text, or Markdown.
Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.

Try for FREE! More on Scrapfly

Legal Disclaimer and Precautions

This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:

Do not scrape at rates that could damage the website.
Do not scrape data that's not available publicly.
Do not store PII of EU citizens who are protected by GDPR.
Do not repurpose the entire public datasets which can be illegal in some countries.

Scrapfly does not offer legal advice but these are good general rules to follow in web scraping and for more you should consult a lawyer.

Products

Features

SDKs

No-Code Platforms

LLM & RAG Apps

Technical Challenges

Popular Targets

Real Estate

eCommerce

Social Media

Company & Reviews

Jobs

Search & SEO

Fashion

Travel & Hotels

Industry Solutions

How to Scrape X.com (Twitter) in 2025

Explore this Article with AI

Key Takeaways

Latest X.com Scraper Code

Why Scrape X.com?

The X.com Scraping Problem

API Is Dead

Breaking Changes Timeline (2023-2025)

Three Things That Break Constantly

The Maintenance Reality

Public View Limitations

ScrapFly's Solution

What Data You Can Scrape

How X.com's API Works

Guest Tokens Explained

doc_ids Explained

Scraping Profiles

Scraping Tweets

Proxies

Why You Need Them

Best Type: Residential Proxies

Cost Example

FAQ

How often does X.com change its defenses?

Why can't I just use a datacenter proxy?

How does ScrapFly know when X.com changes?

Can I scrape X.com without authentication?

Is it legal to scrape X.com?

How much does ScrapFly cost?

X.com Scraping Summary

Explore this Article with AI

Related Knowledgebase

What Python libraries support HTTP2?

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

Python httpx vs requests vs aiohttp - key differences

How to handle popup dialogs in Playwright?

How to use proxies with Python httpx?

How to scrape images from a website?

How to select dictionary key recursively in Python?

How to use cURL in Python?

How to fix python requests ConnectTimeout error?

How to fix Python requests SSLError?

How to open Python http responses in a web browser?

Selenium: geckodriver executable needs to be in PATH?

Related Articles

How to Scrape Facebook: Marketplace and Events

Social Media Scraping in 2025

How to Scrape Naver.com

How to Scrape Imovelweb.com

How to Scrape AutoScout24

How to Scrape Allegro.pl