🚀 We are hiring! See open positions
How To Scrape TikTok in 2025

TikTok's anti-bot defenses have evolved dramatically in 2025. What worked in 2023 is completely broken today. If you've tried scraping TikTok recently, you've likely hit 403 errors, CAPTCHAs, or mysterious "X-Gorgon" header requirements.

TikTok now employs some of the most advanced anti-scraping measures in social media. But the data is still accessible. You just need the right approach.

This guide shows you both the techniques (for learning) and the production-ready solution (for results). We extract profiles, videos, comments, and search data using TikTok's hidden JSON APIs, then show you why DIY scraping requires constant maintenance while ScrapFly handles it automatically.

Key Takeaways

Access TikTok's hidden JSON APIs for profiles, videos, and comments without authentication

  • Understand TikTok's 2025 anti-bot defenses: X-Gorgon headers, IP quality checks, and CAPTCHA challenges
  • Bypass rate limiting (100 req/hour) using residential proxies and smart throttling
  • Parse complex JSON responses with JMESPath to extract only valuable data
  • Handle session management for cookie-protected search endpoints
  • Implement exponential backoff and retry logic for production reliability
  • Use ScrapFly's maintained scraper to avoid 8-15 hours/month of maintenance

Latest TikTok Scraper Code

https://github.com/scrapfly/scrapfly-scrapers/

What TikTok Data Can You Scrape?

TikTok offers rich data for business intelligence, research, and marketing. Here's what you can extract and why it's valuable:

User Profiles

What: Username, follower count, bio, interests, demographics, verification status
Why valuable: Identify influencers, analyze audience demographics, track competitor growth
Example: Find micro-influencers in your niche with 10K-100K followers for cost-effective partnerships

Video Content

What: Video metadata, captions, hashtags, upload dates, video URLs
Why valuable: Track trending content, analyze hashtag performance, monitor brand mentions
Example: Track how your brand is mentioned across TikTok to measure social media impact

Engagement Metrics

What: Likes, comments, shares, view counts, comment text and timestamps
Why valuable: Measure content performance, analyze audience sentiment, identify viral patterns
Example: Analyze which types of content get the most engagement in your industry

Search Results

What: Trending videos, hashtag performance, user discovery
Why valuable: Discover trending topics, find new creators, track keyword performance
Example: Monitor trending hashtags in your industry to create timely content

Why TikTok Scraping is Uniquely Difficult

TikTok employs some of the most advanced anti-bot defenses in social media. What worked in 2023 is completely broken in 2025. Here's what you're up against:

Recent Breaking Changes (Timeline)

Jan 2025: X-Gorgon header became a requirement. The X-Gorgon header is an 84-bit encryption signature that validates the request's authenticity.

Jan 2025: X-Khronos header was added to prevent timestamp-based replay attacks.

Jan 15, 2025: The Search API, which was previously open, now requires valid session cookies.

Feb 2025: The general IP rate limit was reduced from ~150 to ~100 requests/hour.

Feb 2025: Comment pagination was capped at 5,000 comments per post (down from 10,000 in late 2024).

Rate Limiting & IP Blocking

TikTok aggressively blocks IP addresses that make too many requests. The limits are strict:

  • 100 requests per hour per IP address
  • Geographic restrictions - some endpoints only work from specific countries
  • IP quality detection - datacenter IPs are blocked faster than residential IPs

Encrypted Header Requirements (New in 2025)

TikTok now requires two encrypted headers that change frequently:

# These headers are now mandatory for most requests
headers = {
    "X-Gorgon": "encrypted_signature_here",  # 84-bit encryption
    "X-Khronos": "timestamp_signature_here"  # Anti-replay protection
}

The X-Gorgon header is particularly complex - it's generated from request details, device fingerprint, and a rotating secret key that TikTok updates regularly.

Browser Fingerprinting

TikTok analyzes your browser's fingerprint to detect automation:

  • WebGL fingerprinting - detects headless browsers
  • Canvas fingerprinting - identifies bot behavior patterns
  • Audio context fingerprinting - checks for real browser audio capabilities
  • WebRTC fingerprinting - validates network characteristics

Many TikTok endpoints now require valid session cookies:

  • Search API requires authentication cookies
  • Comments API validates session tokens
  • Sessions are IP-locked - cookies only work from the IP that created them
  • 24-hour expiration - sessions expire daily

The bottom line: DIY scraping of TikTok requires constant monthly maintenance. ScrapFly monitors these changes and updates its systems automatically, so your scrapers don't break.

How to Scrape TikTok with ScrapFly

ScrapFly handles all the complexity of TikTok's anti-bot defenses automatically. Here's how it bypasses every protection:

Anti-Scraping Protection

  • Automatic X-Gorgon generation - We reverse engineer TikTok's encryption algorithms
  • Real browser fingerprinting - Uses actual Chrome/Firefox browsers, not headless detection
  • CAPTCHA solving - Built-in automatic CAPTCHA resolution
  • Session management - Handles cookie authentication automatically

Proxy & Rate Limiting

  • Residential proxy rotation - 100M+ IPs from real devices worldwide
  • Smart throttling - Respects rate limits while maximizing throughput
  • Geographic targeting - Route requests through specific countries
  • IP quality optimization - Prioritizes high-quality residential IPs

Maintenance-Free Operation

  • Weekly updates - We monitor TikTok changes and update our systems
  • Zero configuration - Works out of the box with your API key
  • 99%+ success rate - Reliable data extraction at scale
  • 24/7 monitoring - We detect and fix issues before they affect you

Setup

To web scrape TikTok, we use a few Python libraries:

  • httpx: For sending HTTP requests to TikTok and getting the data in either HTML or JSON.
  • parsel: For parsing the HTML and extracting elements using selectors, such as XPath and CSS.
  • JMESPath: For parsing and refining the JSON datasets to exclude unnecessary details.
  • loguru: For monitoring and logging our TikTok scraper in beautiful terminal outputs.
  • scrapfly-sdk: For scraping TikTok pages that require JavaScript rendering and using advanced scraping features using ScrapFly.
  • asyncio: For increasing our increasing our web scraping by running our code asynchronously.

Note that asyncio comes pre-installed in Python. Use the following command to install the other packages:

pip install httpx parsel jmespath loguru

How to Scrape TikTok Profiles

VERIFIED: Oct 7, 2025 | STATUS: Working

The easiest data to scrape from TikTok is public profile data. TikTok embeds a full JSON object of all profile data directly into the HTML source of the page in a <script> tag.

Discovering Hidden Profile Data

Location: Find the <script id="__UNIVERSAL_DATA_FOR_REHYDRATION__"> tag in the HTML.

How to find it:

  1. Open any TikTok profile in your browser
  2. Open Developer Tools (F12) and go to the "Elements" tab
  3. Search (Ctrl+F) for __UNIVERSAL_DATA_FOR_REHYDRATION__

XPath Selector: //script[@id='__UNIVERSAL_DATA_FOR_REHYDRATION__']/text()

This method has been the most stable way to get profile data and has not changed since 2021.

hidden json data on profile pages
Hidden JSON data on profile pages

The profile data is located at webapp.user-detail.userInfo in the JSON structure. This data is commonly referred to as hidden web data - the same data on the page but before getting rendered into HTML.

Scraping Profile Code

Python
ScrapFly
import asyncio
import json
from typing import List, Dict
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log

# initialize an async httpx client
client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser like headers to prevent being blocked
    headers={
        "Accept-Language": "en-US,en;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
    },
)

def parse_profile(response: Response):
    """parse profile data from hidden scripts on the HTML"""
    assert response.status_code == 200, "request is blocked, use the ScrapFly codetabs"
    selector = Selector(response.text)
    data = selector.xpath("//script[@id='__UNIVERSAL_DATA_FOR_REHYDRATION__']/text()").get()
    profile_data = json.loads(data)["__DEFAULT_SCOPE__"]["webapp.user-detail"]["userInfo"]  
    return profile_data


async def scrape_profiles(urls: List[str]) -> List[Dict]:
    """scrape tiktok profiles data from their URLs"""
    to_scrape = [client.get(url) for url in urls]
    data = []
    # scrape the URLs concurrently
    for response in asyncio.as_completed(to_scrape):
        response = await response
        profile_data = parse_profile(response)
        data.append(profile_data)
    log.success(f"scraped {len(data)} profiles from profile pages")
    return data
import asyncio
import json
from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_profile(response: ScrapeApiResponse):
    """parse profile data from hidden scripts on the HTML"""
    selector = response.selector
    data = selector.xpath("//script[@id='__UNIVERSAL_DATA_FOR_REHYDRATION__']/text()").get()
    profile_data = json.loads(data)["__DEFAULT_SCOPE__"]["webapp.user-detail"]["userInfo"]  
    return profile_data


async def scrape_profiles(urls: List[str]) -> List[Dict]:
    """scrape tiktok profiles data from their URLs"""
    to_scrape = [ScrapeConfig(url, asp=True, country="US") for url in urls]
    data = []
    # scrape the URLs concurrently
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        profile_data = parse_profile(response)
        data.append(profile_data)
    log.success(f"scraped {len(data)} profiles from profile pages")
    return data
Run the code
async def run():
    profile_data = await scrape_profiles(
        urls=[
            "https://www.tiktok.com/@oddanimalspecimens"
        ]
    )
    # save the result to a JSON file
    with open("profile_data.json", "w", encoding="utf-8") as file:
        json.dump(profile_data, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())

The code works as follows:

  • Create an async httpx with basic browser headers to avoid blocking.
  • Define a parse_profiles function to select the script tag and parse the profile data.
  • Define a scrape_profiles function to request the profile URLs concurrently while parsing the data from each page.

Running the above TikTok scraper will create a JSON file named profile_data. The output looks like this:

Output
[
  {
    "user": {
      "id": "6976999329680589829",
      "shortId": "",
      "uniqueId": "oddanimalspecimens",
      "nickname": "Odd Animal Specimens",
      "avatarLarger": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7327535918275887147~c5_1080x1080.jpeg?lk3s=a5d48078&x-expires=1709280000&x-signature=1rRtT4jX0Tk5hK6cpSsDcqeU7cM%3D",
      "avatarMedium": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7327535918275887147~c5_720x720.jpeg?lk3s=a5d48078&x-expires=1709280000&x-signature=WXYAMT%2BIs9YV52R6jrg%2F1ccwdcE%3D",
      "avatarThumb": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7327535918275887147~c5_100x100.jpeg?lk3s=a5d48078&x-expires=1709280000&x-signature=rURTqWGfKNEiwl42nGtc8ufRIOw%3D",
      "signature": "YOUTUBE: Odd Animal Specimens\nCONTACT: OddAnimalSpecimens@whalartalent.com",
      "createTime": 0,
      "verified": false,
      "secUid": "MS4wLjABAAAAmiTQjtyN2Q_JQji6RgtgX2fKqOA-gcAAUU4SF9c7ktL3uPoWu0nLpBfqixgacB8u",
      "ftc": false,
      "relation": 0,
      "openFavorite": false,
      "bioLink": {
        "link": "linktr.ee/oddanimalspecimens",
        "risk": 0
      },
      "commentSetting": 0,
      "commerceUserInfo": {
        "commerceUser": false
      },
      "duetSetting": 0,
      "stitchSetting": 0,
      "privateAccount": false,
      "secret": false,
      "isADVirtual": false,
      "roomId": "",
      "uniqueIdModifyTime": 0,
      "ttSeller": false,
      "region": "US",
      "profileTab": {
        "showMusicTab": false,
        "showQuestionTab": true,
        "showPlayListTab": true
      },
      "followingVisibility": 1,
      "recommendReason": "",
      "nowInvitationCardUrl": "",
      "nickNameModifyTime": 0,
      "isEmbedBanned": false,
      "canExpPlaylist": true,
      "profileEmbedPermission": 1,
      "language": "en",
      "eventList": [],
      "suggestAccountBind": false
    },
    "stats": {
      "followerCount": 2600000,
      "followingCount": 6,
      "heart": 44600000,
      "heartCount": 44600000,
      "videoCount": 124,
      "diggCount": 0,
      "friendCount": 3
    },
    "itemList": []
  }
]

We can successfully scrape TikTok for profile data. However, we are missing the profile's video data. Now we extract it.

How to Scrape TikTok Channels

VERIFIED: Oct 7, 2025 | STATUS: Working

Scraping a channel (a user's full video feed) is more complex. While profile data is in the initial HTML, video data is loaded dynamically as you scroll.

Why Channels Are Different

  • Profile Pages: Static HTML with all data embedded
  • Channel Pages: Video data is loaded from a background API endpoint (/api/post/item_list/) via XHR calls as the user scrolls

A simple HTTP request won't get the video data. You need to simulate scrolling in a real browser.

The above background XHR calls are loaded while scrolling down the page. These calls were sent to the endpoint /api/post/item_list/, which returns the channel video data through batches.

To scrape channel data, we can request the /post/item_list/ API endpoint directly. However, this endpoint requires many different parameters, which can be challenging to maintain. Therefore, we extract the data from the XHR calls.

Web Scraping Background Requests with Headless Browsers

In this tutorial we'll be taking a look at a rather new and popular web scraping technique - capturing background requests using headless browsers.

Web Scraping Background Requests with Headless Browsers

TikTok allows non-logged-in users to view the profile pages. However, it restricts any actions unless you are logged in, meaning that we can't scroll down with the mouse actions. Therefore, we scroll down using JavaScript code that gets executed upon sending a request:

function scrollToEnd(i) {
    // check if already at the bottom and stop if there aren't more scrolls
    if (window.innerHeight + window.scrollY >= document.body.scrollHeight) {
        console.log("Reached the bottom.");
        return;
    }

    // scroll down
    window.scrollTo(0, document.body.scrollHeight);

    // set a maximum of 15 iterations
    if (i < 15) {
        setTimeout(() => scrollToEnd(i + 1), 3000);
    } else {
        console.log("Reached the end of iterations.");
    }
}

scrollToEnd(0);

Here, we create a JavaScript function to scroll down and wait between each scroll iteration for the XHR requests to finish loading. It has a maximum of 15 scrolls, which is sufficient for most profiles.

Use the above JavaScript code to scrape TikTok channel data from XHR calls:

import jmespath
import asyncio
import json
from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly APi key")

js_scroll_function = """
function scrollToEnd(i) {
    // check if already at the bottom and stop if there aren't more scrolls
    if (window.innerHeight + window.scrollY >= document.body.scrollHeight) {
        console.log("Reached the bottom.");
        return;
    }

    // scroll down
    window.scrollTo(0, document.body.scrollHeight);

    // set a maximum of 15 iterations
    if (i < 15) {
        setTimeout(() => scrollToEnd(i + 1), 3000);
    } else {
        console.log("Reached the end of iterations.");
    }
}

scrollToEnd(0);
"""

def parse_channel(response: ScrapeApiResponse):
    """parse channel video data from XHR calls"""
    # extract the xhr calls and extract the ones for videos
    _xhr_calls = response.scrape_result["browser_data"]["xhr_call"]
    post_calls = [c for c in _xhr_calls if "/api/post/item_list/" in c["url"]]
    post_data = []
    for post_call in post_calls:
        try:
            data = json.loads(post_call["response"]["body"])["itemList"]
        except Exception:
            raise Exception("Post data couldn't load")
        post_data.extend(data)
    # parse all the data using jmespath
    parsed_data = []
    for post in post_data:
        result = jmespath.search(
            """{
            createTime: createTime,
            desc: desc,
            id: id,
            stats: stats,
            contents: contents[].{desc: desc, textExtra: textExtra[].{hashtagName: hashtagName}},
            video: video
            }""",
            post
        )
        parsed_data.append(result)    
    return parsed_data


async def scrape_channel(url: str) -> List[Dict]:
    """scrape video data from a channel (profile with videos)"""
    log.info(f"scraping channel page with the URL {url} for post data")
    response = await SCRAPFLY.async_scrape(
        ScrapeConfig(
            url,
            asp=True,
            country="AU",
            render_js=True,
            rendering_wait=5000,
            js=js_scroll_function,
            wait_for_selector="//div[@id='main-content-video_detail']",
        )
    )
    data = parse_channel(response)
    log.success(f"scraped {len(data)} posts data")
    return data
Run the code
async def run():
    channel_data = await scrape_channel(
        url="https://www.tiktok.com/@oddanimalspecimens"
    )
    # save the result to a JSON file
    with open("channel_data.json", "w", encoding="utf-8") as file:
        json.dump(channel_data, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())

The execution flow works as follows:

  • A request with a headless browser is sent to the profile page.
  • The JavaScript scroll function gets executed.
  • More channel video data are loaded through background XHR calls.
  • The parse_channel function iterates over the responses of all the XHR calls and saves the video data into the post_data array.
  • The channel data are refined using JMESPath to exclude the unnecessary details.

We extracted a small portion of each video data from the responses. The full response includes further details that might be useful. Sample output:

Sample output
[
    {
        "createTime": 1675963028,
        "desc": "Mouse to Whale Vertebrae - What bone should I do next? How big is a mouse vertebra? How big is a whale vertebrae? A lot bigger, but all vertebrae share the same shape. Specimen use made possible by the University of Michigan Museum of Zoology. #animals #science #learnontiktok ",
        "id": "7198206283571285294",
        "stats": {
            "collectCount": 92400,
            "commentCount": 5464,
            "diggCount": 1500000,
            "playCount": 14000000,
            "shareCount": 11800
        },
        "contents": [
            {
                "desc": "Mouse to Whale Vertebrae - What bone should I do next? How big is a mouse vertebra? How big is a whale vertebrae? A lot bigger, but all vertebrae share the same shape. Specimen use made possible by the University of Michigan Museum of Zoology. #animals #science #learnontiktok ",
                "textExtra": [
                    {
                        "hashtagName": "animals"
                    },
                    {
                        "hashtagName": "science"
                    },
                    {
                        "hashtagName": "learnontiktok"
                    }
                ]
            }
        ],
        "video": {
            "bitrate": 441356,
            "bitrateInfo": [
                ....
            ],
            "codecType": "h264",
            "cover": "https://p16-sign.tiktokcdn-us.com/obj/tos-useast5-p-0068-tx/3a2c21cd21ad4410b8ad7ab606aa0f45_1675963028?x-expires=1709287200&x-signature=Iv3PLyTi3PIWT4QUewp6MPnRU9c%3D",
            "definition": "540p",
            "downloadAddr": "https://v16-webapp-prime.tiktok.com/video/tos/maliva/tos-maliva-ve-0068c799-us/ed00b2ad6b9b4248ab0a4dd8494b9cfc/?a=1988&ch=0&cr=3&dr=0&lr=tiktok_m&cd=0%7C0%7C1%7C&cv=1&br=932&bt=466&bti=ODszNWYuMDE6&cs=0&ds=3&ft=4fUEKMvt8Zmo0K4Mi94jVhstrpWrKsd.&mime_type=video_mp4&qs=0&rc=ZTs1ZTw8aTZmZzU8ZGdpNkBpanFrZWk6ZmlsaTMzZzczNEBgLmJgYTQ0NjQxYDQuXi81YSNtMjZocjRvZ2ZgLS1kMS9zcw%3D%3D&btag=e00088000&expire=1709138858&l=20240228104720CEC3E63CBB78C407D3AE&ply_type=2&policy=2&signature=b86d518a02194c8bd389986d95b546a8&tk=tt_chain_token",
            "duration": 16,
            "dynamicCover": "https://p19-sign.tiktokcdn-us.com/obj/tos-useast5-p-0068-tx/348b414f005f4e49877e6c5ebe620832_1675963029?x-expires=1709287200&x-signature=xJyE12Y5TPj2IYQJF6zJ6%2FALwVw%3D",
            "encodeUserTag": "",
            "encodedType": "normal",
            "format": "mp4",
            "height": 1024,
            "id": "7198206283571285294",
            "originCover": "https://p16-sign.tiktokcdn-us.com/obj/tos-useast5-p-0068-tx/3f677464b38a4457959a7b329002defe_1675963028?x-expires=1709287200&x-signature=KX5gLesyY80rGeHg6ywZnKVOUnY%3D",
            "playAddr": "https://v16-webapp-prime.tiktok.com/video/tos/maliva/tos-maliva-ve-0068c799-us/e9748ee135d04a7da145838ad43daa8e/?a=1988&ch=0&cr=3&dr=0&lr=unwatermarked&cd=0%7C0%7C0%7C&cv=1&br=862&bt=431&bti=ODszNWYuMDE6&cs=0&ds=6&ft=4fUEKMvt8Zmo0K4Mi94jVhstrpWrKsd.&mime_type=video_mp4&qs=0&rc=OzRlNzNnPDtlOTxpZjMzNkBpanFrZWk6ZmlsaTMzZzczNEAzYzI0MC1gNl8xMzUxXmE2YSNtMjZocjRvZ2ZgLS1kMS9zcw%3D%3D&btag=e00088000&expire=1709138858&l=20240228104720CEC3E63CBB78C407D3AE&ply_type=2&policy=2&signature=21ea870dc90edb60928080a6bdbfd23a&tk=tt_chain_token",
            "ratio": "540p",
            "subtitleInfos": [
                ....
            ],
            "videoQuality": "normal",
            "volumeInfo": {
                "Loudness": -15.3,
                "Peak": 0.79433
            },
            "width": 576,
            "zoomCover": {
                "240": "https://p16-sign.tiktokcdn-us.com/tos-useast5-p-0068-tx/3a2c21cd21ad4410b8ad7ab606aa0f45_1675963028~tplv-photomode-zoomcover:240:240.avif?x-expires=1709287200&x-signature=UV1mNc2EHUy6rf9eRQvkS%2FX%2BuL8%3D",
                "480": "https://p16-sign.tiktokcdn-us.com/tos-useast5-p-0068-tx/3a2c21cd21ad4410b8ad7ab606aa0f45_1675963028~tplv-photomode-zoomcover:480:480.avif?x-expires=1709287200&x-signature=PT%2BCf4%2F4MC70e2VWHJC40TNv%2Fbc%3D",
                "720": "https://p19-sign.tiktokcdn-us.com/tos-useast5-p-0068-tx/3a2c21cd21ad4410b8ad7ab606aa0f45_1675963028~tplv-photomode-zoomcover:720:720.avif?x-expires=1709287200&x-signature=3t7Dxca4pBoNYtzoYzui8ZWdALM%3D",
                "960": "https://p16-sign.tiktokcdn-us.com/tos-useast5-p-0068-tx/3a2c21cd21ad4410b8ad7ab606aa0f45_1675963028~tplv-photomode-zoomcover:960:960.avif?x-expires=1709287200&x-signature=aKcJ0jxPTQx3YMV5lPLRlLMrkso%3D"
            }
        }
    },
    ....
]

The above code extracted over a hundred video data with a few lines of code in less than a minute. That's pretty powerful!

How to Scrape TikTok Posts

VERIFIED: Oct 7, 2025 | STATUS: Working

Good news: scraping individual posts uses the exact same hidden JSON technique as scraping profiles.

The only differences are:

  • URL Format: https://www.tiktok.com/@username/video/{ID}
  • JSON Path: The data is located at webapp.video-detail.itemInfo.itemStruct instead of webapp.user-detail

What about comments? The comments for a video are not included in this initial HTML. They are loaded from a separate, hidden API, which we explore next.

Discovering Hidden Post Data

Go to any video on TikTok, inspect the page and search for the following selector:

//script[@id='__UNIVERSAL_DATA_FOR_REHYDRATION__']/text()

The post data in the above script tag looks like this:

hidden data of tiktok posts
Hidden data on TikTok posts

Scraping Post Code

Python
ScrapFly
import jmespath
import asyncio
import json
from typing import List, Dict
from httpx import AsyncClient, Response
from parsel import Selector
from loguru import logger as log

# initialize an async httpx client
client = AsyncClient(
    # enable http2
    http2=True,
    # add basic browser like headers to prevent being blocked
    headers={
        "Accept-Language": "en-US,en;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
    },
)

def parse_post(response: Response) -> Dict:
    """parse hidden post data from HTML"""
    assert response.status_code == 200, "request is blocked, use the ScrapFly codetabs"    
    selector = Selector(response.text)
    data = selector.xpath("//script[@id='__UNIVERSAL_DATA_FOR_REHYDRATION__']/text()").get()
    post_data = json.loads(data)["__DEFAULT_SCOPE__"]["webapp.video-detail"]["itemInfo"]["itemStruct"]
    parsed_post_data = jmespath.search(
        """{
        id: id,
        desc: desc,
        createTime: createTime,
        video: video.{duration: duration, ratio: ratio, cover: cover, playAddr: playAddr, downloadAddr: downloadAddr, bitrate: bitrate},
        author: author.{id: id, uniqueId: uniqueId, nickname: nickname, avatarLarger: avatarLarger, signature: signature, verified: verified},
        stats: stats,
        locationCreated: locationCreated,
        diversificationLabels: diversificationLabels,
        suggestedWords: suggestedWords,
        contents: contents[].{textExtra: textExtra[].{hashtagName: hashtagName}}
        }""",
        post_data
    )
    return parsed_post_data


async def scrape_posts(urls: List[str]) -> List[Dict]:
    """scrape tiktok posts data from their URLs"""
    to_scrape = [client.get(url) for url in urls]
    data = []
    for response in asyncio.as_completed(to_scrape):
        response = await response
        post_data = parse_post(response)
        data.append(post_data)
    log.success(f"scraped {len(data)} posts from post pages")
    return data
import jmespath
import asyncio
import json
from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")

def parse_post(response: ScrapeApiResponse) -> Dict:
    """parse hidden post data from HTML"""
    selector = response.selector
    data = selector.xpath("//script[@id='__UNIVERSAL_DATA_FOR_REHYDRATION__']/text()").get()
    post_data = json.loads(data)["__DEFAULT_SCOPE__"]["webapp.video-detail"]["itemInfo"]["itemStruct"]
    parsed_post_data = jmespath.search(
        """{
        id: id,
        desc: desc,
        createTime: createTime,
        video: video.{duration: duration, ratio: ratio, cover: cover, playAddr: playAddr, downloadAddr: downloadAddr, bitrate: bitrate},
        author: author.{id: id, uniqueId: uniqueId, nickname: nickname, avatarLarger: avatarLarger, signature: signature, verified: verified},
        stats: stats,
        locationCreated: locationCreated,
        diversificationLabels: diversificationLabels,
        suggestedWords: suggestedWords,
        contents: contents[].{textExtra: textExtra[].{hashtagName: hashtagName}}
        }""",
        post_data
    )
    return parsed_post_data


async def scrape_posts(urls: List[str]) -> List[Dict]:
    """scrape tiktok posts data from their URLs"""
    to_scrape = [ScrapeConfig(url, country="US", asp=True) for url in urls]
    data = []
    async for response in SCRAPFLY.concurrent_scrape(to_scrape):
        post_data = parse_post(response)
        data.append(post_data)
    log.success(f"scraped {len(data)} posts from post pages")
    return data
Run the code
async def run():
    post_data = await scrape_posts(
        urls=[
            "https://www.tiktok.com/@oddanimalspecimens/video/7198206283571285294"
        ]
    )
    # save the result to a JSON file
    with open("post_data.json", "w", encoding="utf-8") as file:
        json.dump(post_data, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())

In the above code, we define two functions. Here's how they work:

  • parse_post: For parsing the post data from the script tag and refining it with JMESPath to extract the useful details only.
  • scrape_posts: For scraping multiple post pages concurrently by adding the URLs to a scraping list and requesting them concurrently.

The created post_data file looks like this:

Output
[
  {
    "id": "7198206283571285294",
    "desc": "Mouse to Whale Vertebrae - What bone should I do next? How big is a mouse vertebra? How big is a whale vertebrae? A lot bigger, but all vertebrae share the same shape. Specimen use made possible by the University of Michigan Museum of Zoology. #animals #science #learnontiktok ",
    "createTime": "1675963028",
    "video": {
      "duration": 16,
      "ratio": "540p",
      "cover": "https://p16-sign.tiktokcdn-us.com/obj/tos-useast5-p-0068-tx/3a2c21cd21ad4410b8ad7ab606aa0f45_1675963028?x-expires=1709290800&x-signature=YP7J1o2kv1dLnyjv3hqwBBk487g%3D",
      "playAddr": "https://v16-webapp-prime.tiktok.com/video/tos/maliva/tos-maliva-ve-0068c799-us/e9748ee135d04a7da145838ad43daa8e/?a=1988&ch=0&cr=3&dr=0&lr=unwatermarked&cd=0%7C0%7C0%7C&cv=1&br=862&bt=431&bti=ODszNWYuMDE6&cs=0&ds=6&ft=4fUEKMUj8Zmo0Qnqi94jVZgzZpWrKsd.&mime_type=video_mp4&qs=0&rc=OzRlNzNnPDtlOTxpZjMzNkBpanFrZWk6ZmlsaTMzZzczNEAzYzI0MC1gNl8xMzUxXmE2YSNtMjZocjRvZ2ZgLS1kMS9zcw%3D%3D&btag=e00088000&expire=1709142489&l=202402281147513D9DCF4EE8518C173598&ply_type=2&policy=2&signature=c0c4220f863ca89053ec2a71b180f226&tk=tt_chain_token",
      "downloadAddr": "https://v16-webapp-prime.tiktok.com/video/tos/maliva/tos-maliva-ve-0068c799-us/ed00b2ad6b9b4248ab0a4dd8494b9cfc/?a=1988&ch=0&cr=3&dr=0&lr=tiktok_m&cd=0%7C0%7C1%7C&cv=1&br=932&bt=466&bti=ODszNWYuMDE6&cs=0&ds=3&ft=4fUEKMUj8Zmo0Qnqi94jVZgzZpWrKsd.&mime_type=video_mp4&qs=0&rc=ZTs1ZTw8aTZmZzU8ZGdpNkBpanFrZWk6ZmlsaTMzZzczNEBgLmJgYTQ0NjQxYDQuXi81YSNtMjZocjRvZ2ZgLS1kMS9zcw%3D%3D&btag=e00088000&expire=1709142489&l=202402281147513D9DCF4EE8518C173598&ply_type=2&policy=2&signature=779a4044a0768f870abed13e1401608f&tk=tt_chain_token",
      "bitrate": 441356
    },
    "author": {
      "id": "6976999329680589829",
      "uniqueId": "oddanimalspecimens",
      "nickname": "Odd Animal Specimens",
      "avatarLarger": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7327535918275887147~c5_1080x1080.jpeg?lk3s=a5d48078&x-expires=1709290800&x-signature=F8hu8G4VOFyd%2F0TN7QEZcGLNmW0%3D",
      "signature": "YOUTUBE: Odd Animal Specimens\nCONTACT: OddAnimalSpecimens@whalartalent.com",
      "verified": false
    },
    "stats": {
      "diggCount": 1500000,
      "shareCount": 11800,
      "commentCount": 5471,
      "playCount": 14000000,
      "collectCount": "92420"
    },
    "locationCreated": "US",
    "diversificationLabels": [
      "Science",
      "Education",
      "Culture & Education & Technology"
    ],
    "suggestedWords": [],
    "contents": [
      {
        "textExtra": [
          {
            "hashtagName": "animals"
          },
          {
            "hashtagName": "science"
          },
          {
            "hashtagName": "learnontiktok"
          }
        ]
      }
    ]
  }
]

The above TikTok scraping code has successfully extracted the video data from its page. However, the comments are missing! We scrape them in the following section.

How to Scrape TikTok Comments

VERIFIED: Oct 7, 2025 | STATUS: Working

The comment data on a post aren't found in the HTML. Instead, it's loaded from a hidden API as you scroll.

Discovering the Hidden Comments API

How to find it:

  1. Open Developer Tools and go to the Network tab
  2. Load a TikTok video page
  3. Scroll down to the comments section to trigger the API call
  4. Filter the requests for /api/comment/list/
API response on browser developer tools
Hidden comments API

The API request was sent to the endpoint https://www.tiktok.com/api/comment/list/ with these required parameters:

{
    "aweme_id": 7198206283571285294,  # the post ID
    "count": 20,                      # number of comments per call
    "cursor": 0                       # pagination offset
}

Important: As of Feb 2025, comment pagination is capped at 5,000 comments per post (down from 10,000 in late 2024).

Scraping Comments Code

import jmespath
import asyncio
import json
from urllib.parse import urlencode, urlparse, parse_qs
from typing import Dict, List
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

BASE_CONFIG = {
    # bypass tiktok.com web scraping blocking
    "asp": True,
    # set the proxy country to US
    "country": "AU",
}

SCRAPFLY = ScrapflyClient(key="Your ScrapFly API key")


def parse_comments(response: ScrapeApiResponse) -> List[Dict]:
    """parse comments data from the API response"""
    data = json.loads(response.scrape_result["content"])
    comments_data = data["comments"]
    total_comments = data["total"]
    parsed_comments = []
    # refine the comments with JMESPath
    for comment in comments_data:
        result = jmespath.search(
            """{
            text: text,
            comment_language: comment_language,
            digg_count: digg_count,
            reply_comment_total: reply_comment_total,
            author_pin: author_pin,
            create_time: create_time,
            cid: cid,
            nickname: user.nickname,
            unique_id: user.unique_id,
            aweme_id: aweme_id
            }""",
            comment
        )
        parsed_comments.append(result)
    return {"comments": parsed_comments, "total_comments": total_comments}


async def retrieve_comment_params(post_url: str) -> Dict:
    """retrieve query parameters for the comments API"""
    response = await SCRAPFLY.async_scrape(
        ScrapeConfig(
            post_url, **BASE_CONFIG, render_js=True,
            rendering_wait=5000, wait_for_selector="//div[@id='main-content-video_detail']"
        )
    )

    _xhr_calls = response.scrape_result["browser_data"]["xhr_call"]
    for i in _xhr_calls:
        if "api/comment/list" not in i["url"]:
            continue
        url = urlparse(i["url"])
        qs = parse_qs(url.query)
        # remove the params we'll override
        for key in ["count", "cursor"]:
            _ = qs.pop(key, None)
        api_params = {key: value[0] for key, value in qs.items()}
        return api_params


async def scrape_comments(post_url: str, comments_count: int = 20, max_comments: int = None) -> List[Dict]:
    """scrape comments from tiktok posts using hidden APIs"""
    post_id = post_url.split("/video/")[1].split("?")[0]
    api_params = await retrieve_comment_params(post_url)

    def form_api_url(cursor: int):
        """form the reviews API URL and its pagination values"""
        base_url = "https://www.tiktok.com/api/comment/list/?"
        params = {"count": comments_count, "cursor": cursor, **api_params}  # the index to start from
        return base_url + urlencode(params)

    log.info("scraping the first comments batch")
    first_page = await SCRAPFLY.async_scrape(
        ScrapeConfig(form_api_url(cursor=0), **BASE_CONFIG, headers={"content-type": "application/json"})
    )
    data = parse_comments(first_page)
    comments_data = data["comments"]
    total_comments = data["total_comments"]

    # get the maximum number of comments to scrape
    if max_comments and max_comments < total_comments:
        total_comments = max_comments

    # scrape the remaining comments concurrently
    log.info(f"scraping comments pagination, remaining {total_comments // comments_count - 1} more pages")
    _other_pages = [
        ScrapeConfig(form_api_url(cursor=cursor), **BASE_CONFIG, headers={"content-type": "application/json"})
        for cursor in range(comments_count, total_comments + comments_count, comments_count)
    ]
    async for response in SCRAPFLY.concurrent_scrape(_other_pages):
        data = parse_comments(response)["comments"]
        comments_data.extend(data)

    log.success(f"scraped {len(comments_data)} from the comments API from the post with the ID {post_id}")
    return comments_data
Run the code
async def run():
    comment_data = await scrape_comments(
        # the post/video URL containing the comments
        post_url="https://www.tiktok.com/@oddanimalspecimens/video/7198206283571285294",
        # total comments to scrape, omitting it will scrape all the avilable comments
        max_comments=24,
        # default is 20, it can be overriden to scrape more comments in each call but it can't be > the total comments on the post
        comments_count=20
    )
    # save the result to a JSON file
    with open("comment_data.json", "w", encoding="utf-8") as file:
        json.dump(comment_data, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())

The above code scrapes TikTok comments data using two main functions:

  • scrape_comments: To create the comments API URL with the desired offset and request it to get the comment data in JSON.
  • parse_comments: Parses the comments API responses and extracts the useful data using JMESPath.

Sample output of the comment data:

Sample output
[
  {
    "text": "Dude give 'em back",
    "comment_language": "en",
    "digg_count": 72009,
    "reply_comment_total": 131,
    "author_pin": false,
    "create_time": 1675963633,
    "cid": "7198208855277060910",
    "nickname": "GrandMoffJames",
    "unique_id": "grandmoffjames",
    "aweme_id": "7198206283571285294"
  },
  {
    "text": "Dudes got everyone's back",
    "comment_language": "en",
    "digg_count": 36982,
    "reply_comment_total": 100,
    "author_pin": false,
    "create_time": 1675966520,
    "cid": "7198221275168719662",
    "nickname": "Scott",
    "unique_id": "troutfishmanjr",
    "aweme_id": "7198206283571285294"
  },
  {
    "text": "do human backbone",
    "comment_language": "en",
    "digg_count": 18286,
    "reply_comment_total": 99,
    "author_pin": false,
    "create_time": 1676553505,
    "cid": "7200742421726216987",
    "nickname": "www",
    "unique_id": "ksjekwjkdbw",
    "aweme_id": "7198206283571285294"
  },
  {
    "text": "casually has a backbone in his inventory",
    "comment_language": "en",
    "digg_count": 20627,
    "reply_comment_total": 9,
    "author_pin": false,
    "create_time": 1676106562,
    "cid": "7198822535374734126",
    "nickname": "*",
    "unique_id": "angelonextdoor",
    "aweme_id": "7198206283571285294"
  },
  {
    "text": "😧",
    "comment_language": "",
    "digg_count": 7274,
    "reply_comment_total": 20,
    "author_pin": false,
    "create_time": 1675963217,
    "cid": "7198207091995132698",
    "nickname": "Son Bi'",
    "unique_id": "son_bisss",
    "aweme_id": "7198206283571285294"
  },
  ....
]

The above TikTok scraper code can scrape tens of comments in mere seconds. That's because utilizing the TikTok hidden APIs for web scraping is much faster than parsing data from HTML.

VERIFIED: Oct 7, 2025 | STATUS: Working
Difficulty: Hard. The most complex endpoint to scrape.

Critical Update (Jan 15, 2025)

A major breaking change occurred in January 2025: the search API now requires valid session cookies. Before this, the endpoint was open. If your old search scraper broke, this is why.

Before Jan 15, 2025 After Jan 15, 2025
Cookies optional Cookies REQUIRED
Direct API access worked 403 Forbidden without session

Session Management

To get valid cookies, we must first visit the search page in a real browser to establish a session, then reuse that session's cookies for direct API calls. ScrapFly's session parameter automates this.

async def obtain_session(keyword):
    """Create a ScrapFly session to store cookies for the search API."""
    session_id = f"tiktok_search_{keyword}"
    search_url = f"https://www.tiktok.com/search?q={keyword}"
    await SCRAPFLY.async_scrape(ScrapeConfig(
        search_url,
        asp=True,
        render_js=True,  # Use a real browser to get valid cookies
        session=session_id,
    ))
    return session_id
  • Session Lifetime: Cookies are valid for 24 hours
  • IP Locking: Sessions are often locked to the IP that created them

search_id Generation

TikTok also validates the search_id parameter, which must be a 32-character hex string that includes a current timestamp.

import datetime
import secrets

def generate_search_id():
    """TikTok validates this format."""
    timestamp = datetime.datetime.now().strftime('%Y%m%d%H%M%S')  # 14 chars
    random_hex = secrets.token_hex(9).upper()                   # 18 hex chars
    return timestamp + random_hex                               # Total: 32 chars

Scraping Search Code

import datetime
import secrets
import asyncio
import json
import jmespath
from typing import Dict, List
from urllib.parse import urlencode, quote
from loguru import logger as log
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

SCRAPFLY = ScrapflyClient(key="Your Scrapfly API key key")

def parse_search(response: ScrapeApiResponse) -> List[Dict]:
    """parse search data from the API response"""
    data = json.loads(response.scrape_result["content"])
    search_data = data["data"]
    parsed_search = []
    for item in search_data:
        if item["type"] == 1: # get the item if it was item only
            result = jmespath.search(
                """{
                id: id,
                desc: desc,
                createTime: createTime,
                video: video,
                author: author,
                stats: stats,
                authorStats: authorStats
                }""",
                item["item"]
            )
            result["type"] = item["type"]
            parsed_search.append(result)

    # wheter there is more search results: 0 or 1. There is no max searches available
    has_more = data["has_more"]
    return parsed_search


async def obtain_session(url: str) -> str:
    """create a session to save the cookies and authorize the search API"""
    session_id="tiktok_search_session"
    await SCRAPFLY.async_scrape(ScrapeConfig(
        url, asp=True, country="US", render_js=True, session=session_id
    ))
    return session_id


async def scrape_search(keyword: str, max_search: int, search_count: int = 12) -> List[Dict]:
    """scrape tiktok search data from the search API"""

    def generate_search_id():
        # get the current datetime and format it as YYYYMMDDHHMMSS
        timestamp = datetime.datetime.now().strftime('%Y%m%d%H%M%S')
        # calculate the length of the random hex required for the total length (32)
        random_hex_length = (32 - len(timestamp)) // 2  # calculate bytes needed
        random_hex = secrets.token_hex(random_hex_length).upper()
        random_id = timestamp + random_hex
        return random_id

    def form_api_url(cursor: int):
        """form the reviews API URL and its pagination values"""
        base_url = "https://www.tiktok.com/api/search/general/full/?"
        params = {
            "keyword": quote(keyword),
            "offset": cursor, # the index to start from
            "search_id": generate_search_id()
        }
        return base_url + urlencode(params)

    log.info("obtaining a session for the search API")
    session_id = await obtain_session(url="https://www.tiktok.com/search?q=" + quote(keyword))

    log.info("scraping the first search batch")
    first_page = await SCRAPFLY.async_scrape(ScrapeConfig(
        form_api_url(cursor=0), asp=True, country="US", session=session_id
    ))
    search_data = parse_search(first_page)

    # scrape the remaining comments concurrently
    log.info(f"scraping search pagination, remaining {max_search // search_count} more pages")
    _other_pages = [
        ScrapeConfig(form_api_url(cursor=cursor), asp=True, country="US", session=session_id
    )
        for cursor in range(search_count, max_search + search_count, search_count)
    ]
    async for response in SCRAPFLY.concurrent_scrape(_other_pages):
        data = parse_search(response)
        search_data.extend(data)

    log.success(f"scraped {len(search_data)} from the search API from the keyword {keyword}")
    return search_data
Run the code
async def run():
    search_data = await scrape_search(
        keyword="whales",
        max_search=18
    )
    # save the result to a JSON file
    with open("search_data.json", "w", encoding="utf-8") as file:
        json.dump(search_data, file, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    asyncio.run(run())

The execution flow works as follows:

  • A request is sent to the regular search page to obtain the cookie values through the obtain_session function.
  • A random search ID is created using the generate_search_id to use it with the requests sent to the search API.
  • The first search API URL is created with the form_api_url function.
  • A request is sent to the search API with the session key containing the cookies.
  • The JSON response of the search API is parsed using the parse_search. It also filters the response data to only include the video data.

🙋‍ The above code requests the /search/general/full/ endpoint, which retrieves search results for both profile and video data. This endpoint is limited to a low cursor value. To effectiely manage its pagination, you can use filters to narrow down the results.

Sample output of results:

Sample output
[
  {
    "id": "7192262480066825515",
    "desc": "Replying to @julsss1324 their songs are described as hauntingly beautiful. Do you find them scary or beautiful? For me it's peaceful. They remind me of elephants. 🐋🎶💙 @kaimanaoceansafari #whalesounds #whalesong #hawaii #ocean #deepwater #deepsea #thalassophobia #whales #humpbackwhales ",
    "createTime": 1674579130,
    "video": {
      "id": "7192262480066825515",
      "height": 1024,
      "width": 576,
      "duration": 25,
      "ratio": "540p",
      "cover": "https://p16-sign.tiktokcdn-us.com/tos-useast5-p-0068-tx/e438558728954c74a761132383865d97_1674579131~tplv-dmt-logom:tos-useast5-i-0068-tx/0bb4cf51c9f445c9a46dc8d5aab20545.image?x-expires=1709215200&x-signature=Xl1W9ELtZ5%2FP4oTEpjqOYsGQcx8%3D",
      "originCover": "https://p19-sign.tiktokcdn-us.com/obj/tos-useast5-p-0068-tx/2061429a4535477686769d5f2faeb4f0_1674579131?x-expires=1709215200&x-signature=OJW%2BJnqnYt4L2G2pCryrfh52URI%3D",
      "dynamicCover": "https://p19-sign.tiktokcdn-us.com/obj/tos-useast5-p-0068-tx/88b455ffcbc6421999f47ebeb31b962b_1674579131?x-expires=1709215200&x-signature=hDBbwIe0Z8HRVFxLe%2F2JZoeHopU%3D",
      "playAddr": "https://v16-webapp-prime.us.tiktok.com/video/tos/useast5/tos-useast5-pve-0068-tx/809fca40201048c78299afef3b627627/?a=1988&ch=0&cr=3&dr=0&lr=unwatermarked&cd=0%7C0%7C0%7C&cv=1&br=3412&bt=1706&bti=NDU3ZjAwOg%3D%3D&cs=0&ds=6&ft=4KJMyMzm8Zmo0apOi94jV94rdpWrKsd.&mime_type=video_mp4&qs=0&rc=NDU3PDc0PDw7ZGg7ODg0O0BpM2xycGk6ZnYzaTMzZzczNEBgNl4tLjFiNjMxNTVgYjReYSNucGwzcjQwajVgLS1kMS9zcw%3D%3D&btag=e00088000&expire=1709216449&l=202402271420230081AD419FAC9913AB63&ply_type=2&policy=2&signature=1d44696fa49eb5fa609f6b6871445f77&tk=tt_chain_token",
      "downloadAddr": "https://v16-webapp-prime.us.tiktok.com/video/tos/useast5/tos-useast5-pve-0068-tx/c7196f98798e4520834a64666d253cb6/?a=1988&ch=0&cr=3&dr=0&lr=tiktok_m&cd=0%7C0%7C1%7C&cv=1&br=3514&bt=1757&bti=NDU3ZjAwOg%3D%3D&cs=0&ds=3&ft=4KJMyMzm8Zmo0apOi94jV94rdpWrKsd.&mime_type=video_mp4&qs=0&rc=ZTw5Njg0NDo3Njo7PGllOkBpM2xycGk6ZnYzaTMzZzczNEBhYjFiLjA1NmAxMS8uMDIuYSNucGwzcjQwajVgLS1kMS9zcw%3D%3D&btag=e00088000&expire=1709216449&l=202402271420230081AD419FAC9913AB63&ply_type=2&policy=2&signature=1443d976720e418204704f43af4ff0f5&tk=tt_chain_token",
      "shareCover": [
        "",
        "https://p16-sign.tiktokcdn-us.com/tos-useast5-p-0068-tx/2061429a4535477686769d5f2faeb4f0_1674579131~tplv-photomode-tiktok-play.jpeg?x-expires=1709647200&x-signature=%2B4dufwEEFxPJU0NX4K4Mm%2FPET6E%3D",
        "https://p16-sign.tiktokcdn-us.com/tos-useast5-p-0068-tx/2061429a4535477686769d5f2faeb4f0_1674579131~tplv-photomode-share-play.jpeg?x-expires=1709647200&x-signature=XCorhFJUTCahS8crANfC%2BDSrTbU%3D"
      ],
      "reflowCover": "https://p19-sign.tiktokcdn-us.com/tos-useast5-p-0068-tx/e438558728954c74a761132383865d97_1674579131~tplv-photomode-video-cover:480:480.jpeg?x-expires=1709215200&x-signature=%2BFN9Vq7TxNLLCtJCsMxZIrgjMis%3D",
      "bitrate": 1747435,
      "encodedType": "normal",
      "format": "mp4",
      "videoQuality": "normal",
      "encodeUserTag": ""
    },
    "author": {
      "id": "6763395919847523333",
      "uniqueId": "mermaid.kayleigh",
      "nickname": "mermaid.kayleigh",
      "avatarThumb": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7310953622576037894~c5_100x100.jpeg?lk3s=a5d48078&x-expires=1709215200&x-signature=0tw66iTdRDhPA4pTHM8e4gjIsNo%3D",
      "avatarMedium": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7310953622576037894~c5_720x720.jpeg?lk3s=a5d48078&x-expires=1709215200&x-signature=IkaoB24EJoHdsHCinXmaazAWDYo%3D",
      "avatarLarger": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7310953622576037894~c5_1080x1080.jpeg?lk3s=a5d48078&x-expires=1709215200&x-signature=38KCawETqF%2FdyMX%2FAZg32edHnc4%3D",
      "signature": "Love the ocean with me 💙\nOwner @KaimanaOceanSafari 🤿\nCome dive with me👇🏼",
      "verified": true,
      "secUid": "MS4wLjABAAAAhIICwHiwEKwUg07akDeU_cnM0uE1LAGO-kEQdw3AZ_Rd-zcb-qOR0-1SeZ5D2Che",
      "secret": false,
      "ftc": false,
      "relation": 0,
      "openFavorite": false,
      "commentSetting": 0,
      "duetSetting": 0,
      "stitchSetting": 0,
      "privateAccount": false,
      "downloadSetting": 0
    },
    "stats": {
      "diggCount": 10000000,
      "shareCount": 390800,
      "commentCount": 72100,
      "playCount": 89100000,
      "collectCount": 663400
    },
    "authorStats": {
      "followingCount": 313,
      "followerCount": 2000000,
      "heartCount": 105400000,
      "videoCount": 1283,
      "diggCount": 40800,
      "heart": 105400000
    },
    "type": 1
  },
  ....
]

With this last feature, our TikTok scraper is complete. It can scrape profiles, channels, posts, comments and search data!

Production Deployment with ScrapFly

Reality check: While the httpx examples above work for learning, they fail in production. Here's why:

Why httpx Fails in Production

httpx Limitation Production Impact ScrapFly Solution
Missing X-Gorgon Header 403 Forbidden on all requests Auto-generated & updated
Missing X-Khronos Header Timestamp validation failure Auto-synchronized timestamps
Single IP Address Blocked after ~100 requests Rotating residential proxies
Static User-Agent Easily detected by fingerprinting Randomized browser profiles
No CAPTCHA Handling Scraper gets stuck Built-in automatic solver
Manual Maintenance 8-15 hours/month of reverse-engineering Zero (we update our systems)

Real-World Success Rates:

  • httpx alone: ~60% (and declining)
  • httpx + proxies: ~75% (but with high maintenance)
  • ScrapFly: 98% (with zero maintenance)

Bypass TikTok Scraping Blocking With ScrapFly

We can successfully scrape TikTok data from various pages. However, scaling our scraping rate will lead TikTok to block the IP address used. Moreover, it can challenge the requests with CAPTCHAs if the traffic is suspected:

captcha on tiktok
TikTok sraping blocking

Scrapfly can help resolve TikTok scraper blocking.

scrapfly middleware

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

Here's how to avoid TikTok web scraping blocking using ScrapFly. Replace the HTTP client with the ScrapFly client and enable the asp parameter:

# standard web scraping code
import httpx
from parsel import Selector

response = httpx.get("some tiktok.com URL")
selector = Selector(response.text)

# in ScrapFly becomes this 👇
from scrapfly import ScrapeConfig, ScrapflyClient

# replaces your HTTP client (httpx in this case)
scrapfly = ScrapflyClient(key="Your ScrapFly API key")

response = scrapfly.scrape(ScrapeConfig(
    url="website URL",
    asp=True, # enable the anti scraping protection to bypass blocking
    country="US", # set the proxy location to a specfic country
    render_js=True # enable rendering JavaScript (like headless browsers) to scrape dynamic content if needed
))

# use the built in Parsel selector
selector = response.selector
# access the HTML content
html = response.scrape_result['content']

Troubleshooting Guide

403 Forbidden on First Request

Cause: Missing X-Gorgon or X-Khronos headers (new in 2025)
Solution: Use ScrapFly with asp=True or implement header generation

Comments Return Empty After cursor=5000

Cause: TikTok capped comment pagination at 5,000 comments (Feb 2025)
Solution: Hard limit - you cannot scrape more than 5,000 comments per post

Search API Returns "Login Required"

Cause: Missing session cookies (required since Jan 15, 2025)
Solution: Use ScrapFly sessions or implement cookie management

Blocked After Exactly 100 Requests

Cause: Hit TikTok's rate limit (100 requests/hour per IP)
Solution: Use residential proxies or ScrapFly's automatic rotation

Not in here? Submit an error for diagnosis on our GitHub

FAQs

Scraping public data has strong legal precedents (see hiQ vs. LinkedIn). However, TikTok's Terms of Service prohibit automation. The risk is generally low for responsible research and analysis but increases for aggressive commercial use. For commercial applications, consult legal counsel.

Why does my scraper break every month?

TikTok's engineering team is actively trying to stop scrapers. They update their defenses frequently. Expect 1-2 breaking changes per month.
Solutions: Use our maintained GitHub scraper (which we update within 24h of changes) or commit to 8-15 hours/month of DIY maintenance.

Can I scrape private accounts?

No. This requires authentication and would be a violation of both privacy and likely the Computer Fraud and Abuse Act (CFAA). Only scrape public data.

Latest TikTok Scraper Code
https://github.com/scrapfly/scrapfly-scrapers/

Summary & Next Steps

We've built a complete TikTok scraper that extracts profiles, videos, comments, and search data using TikTok's hidden JSON APIs. The key insight is that TikTok embeds full data in <script> tags, making JSON parsing much faster than HTML scraping. However, TikTok's 2025 anti-bot defenses (X-Gorgon headers, IP quality checks, CAPTCHAs) make DIY scraping require 8-15 hours/month of maintenance.

For production use, ScrapFly handles all the complexity automatically with 99%+ success rates and zero maintenance. Get started with 1,000 free API credits to test the production solution, or clone our GitHub repo to learn the techniques yourself.

Explore this Article with AI

Related Knowledgebase

Related Articles