# Scrapfly Documentation

## Table of Contents

### Dashboard

- [Intro](https://scrapfly.io/docs)
- [Project](https://scrapfly.io/docs/project)
- [Account](https://scrapfly.io/docs/account)
- [Workspace & Team](https://scrapfly.io/docs/workspace-and-team)
- [Billing](https://scrapfly.io/docs/billing)

### Products

#### MCP Server

- [Getting Started](https://scrapfly.io/docs/mcp/getting-started)
- [Tools & API Spec](https://scrapfly.io/docs/mcp/tools)
- [Authentication](https://scrapfly.io/docs/mcp/authentication)
- [Examples & Use Cases](https://scrapfly.io/docs/mcp/examples)
- [FAQ](https://scrapfly.io/docs/mcp/faq)
##### Integrations

- [Overview](https://scrapfly.io/docs/mcp/integrations)
- [Claude Desktop](https://scrapfly.io/docs/mcp/integrations/claude-desktop)
- [Claude Code](https://scrapfly.io/docs/mcp/integrations/claude-code)
- [ChatGPT](https://scrapfly.io/docs/mcp/integrations/chatgpt)
- [Cursor](https://scrapfly.io/docs/mcp/integrations/cursor)
- [Cline](https://scrapfly.io/docs/mcp/integrations/cline)
- [Windsurf](https://scrapfly.io/docs/mcp/integrations/windsurf)
- [Zed](https://scrapfly.io/docs/mcp/integrations/zed)
- [Roo Code](https://scrapfly.io/docs/mcp/integrations/roo-code)
- [VS Code](https://scrapfly.io/docs/mcp/integrations/vscode)
- [LangChain](https://scrapfly.io/docs/mcp/integrations/langchain)
- [LlamaIndex](https://scrapfly.io/docs/mcp/integrations/llamaindex)
- [CrewAI](https://scrapfly.io/docs/mcp/integrations/crewai)
- [OpenAI](https://scrapfly.io/docs/mcp/integrations/openai)
- [n8n](https://scrapfly.io/docs/mcp/integrations/n8n)
- [Make](https://scrapfly.io/docs/mcp/integrations/make)
- [Zapier](https://scrapfly.io/docs/mcp/integrations/zapier)
- [Vapi AI](https://scrapfly.io/docs/mcp/integrations/vapi)
- [Agent Builder](https://scrapfly.io/docs/mcp/integrations/agent-builder)
- [Custom Client](https://scrapfly.io/docs/mcp/integrations/custom-client)


#### Web Scraping API

- [Getting Started](https://scrapfly.io/docs/scrape-api/getting-started)
- [API Specification]()
- [Monitoring](https://scrapfly.io/docs/monitoring)
- [Customize Request](https://scrapfly.io/docs/scrape-api/custom)
- [Debug](https://scrapfly.io/docs/scrape-api/debug)
- [Anti Scraping Protection](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)
- [Proxy](https://scrapfly.io/docs/scrape-api/proxy)
- [Proxy Mode](https://scrapfly.io/docs/scrape-api/proxy-mode)
- [Proxy Mode - Screaming Frog](https://scrapfly.io/docs/scrape-api/proxy-mode/screaming-frog)
- [Proxy Mode - Apify](https://scrapfly.io/docs/scrape-api/proxy-mode/apify)
- [(Auto) Data Extraction](https://scrapfly.io/docs/scrape-api/extraction)
- [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering)
- [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario)
- [SSL](https://scrapfly.io/docs/scrape-api/ssl)
- [DNS](https://scrapfly.io/docs/scrape-api/dns)
- [Cache](https://scrapfly.io/docs/scrape-api/cache)
- [Batch (Multi-URL Scraping)](https://scrapfly.io/docs/scrape-api/batch)
- [Session](https://scrapfly.io/docs/scrape-api/session)
- [Webhook](https://scrapfly.io/docs/scrape-api/webhook)
- [Schedule](https://scrapfly.io/docs/scrape-api/schedule)
- [Screenshot](https://scrapfly.io/docs/scrape-api/screenshot)
- [Errors](https://scrapfly.io/docs/scrape-api/errors)
- [Timeout](https://scrapfly.io/docs/scrape-api/understand-timeout)
- [Throttling](https://scrapfly.io/docs/throttling)
- [Troubleshoot](https://scrapfly.io/docs/scrape-api/troubleshoot)
- [Billing](https://scrapfly.io/docs/scrape-api/billing)
- [FAQ](https://scrapfly.io/docs/scrape-api/faq)

#### Crawler API

- [Getting Started](https://scrapfly.io/docs/crawler-api/getting-started)
- [API Specification]()
- [Retrieving Results](https://scrapfly.io/docs/crawler-api/results)
- [WARC Format](https://scrapfly.io/docs/crawler-api/warc-format)
- [Data Extraction](https://scrapfly.io/docs/crawler-api/extraction-rules)
- [Webhook](https://scrapfly.io/docs/crawler-api/webhook)
- [Schedule](https://scrapfly.io/docs/crawler-api/schedule)
- [Billing](https://scrapfly.io/docs/crawler-api/billing)
- [Errors](https://scrapfly.io/docs/crawler-api/errors)
- [Troubleshoot](https://scrapfly.io/docs/crawler-api/troubleshoot)
- [FAQ](https://scrapfly.io/docs/crawler-api/faq)

#### Screenshot API

- [Getting Started](https://scrapfly.io/docs/screenshot-api/getting-started)
- [API Specification]()
- [Accessibility Testing](https://scrapfly.io/docs/screenshot-api/accessibility)
- [Webhook](https://scrapfly.io/docs/screenshot-api/webhook)
- [Schedule](https://scrapfly.io/docs/screenshot-api/schedule)
- [Billing](https://scrapfly.io/docs/screenshot-api/billing)
- [Errors](https://scrapfly.io/docs/screenshot-api/errors)

#### Extraction API

- [Getting Started](https://scrapfly.io/docs/extraction-api/getting-started)
- [API Specification]()
- [Rules Template](https://scrapfly.io/docs/extraction-api/rules-and-template)
- [LLM Extraction](https://scrapfly.io/docs/extraction-api/llm-prompt)
- [AI Auto Extraction](https://scrapfly.io/docs/extraction-api/automatic-ai)
- [Webhook](https://scrapfly.io/docs/extraction-api/webhook)
- [Billing](https://scrapfly.io/docs/extraction-api/billing)
- [Errors](https://scrapfly.io/docs/extraction-api/errors)
- [FAQ](https://scrapfly.io/docs/extraction-api/faq)

#### Data API


#### Proxy Saver

- [Getting Started](https://scrapfly.io/docs/proxy-saver/getting-started)
- [Fingerprints](https://scrapfly.io/docs/proxy-saver/fingerprints)
- [Optimizations](https://scrapfly.io/docs/proxy-saver/optimizations)
- [SSL Certificates](https://scrapfly.io/docs/proxy-saver/certificates)
- [Protocols](https://scrapfly.io/docs/proxy-saver/protocols)
- [Pacfile](https://scrapfly.io/docs/proxy-saver/pacfile)
- [Secure Credentials](https://scrapfly.io/docs/proxy-saver/security)
- [Billing](https://scrapfly.io/docs/proxy-saver/billing)

#### Cloud Browser API

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started)
- [Proxy & Geo-Targeting](https://scrapfly.io/docs/cloud-browser-api/proxy)
- [Unblock API](https://scrapfly.io/docs/cloud-browser-api/unblock)
- [Captcha Solver](https://scrapfly.io/docs/cloud-browser-api/captcha-solver)
- [File Downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads)
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume)
- [Human-in-the-Loop](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop)
- [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode)
- [Browser Extensions](https://scrapfly.io/docs/cloud-browser-api/extensions)
- [Native Browser MCP](https://scrapfly.io/docs/cloud-browser-api/mcp)
- [DevTools Protocol](https://scrapfly.io/docs/cloud-browser-api/cdp-reference)
##### Integrations

- [Puppeteer](https://scrapfly.io/docs/cloud-browser-api/puppeteer)
- [Playwright](https://scrapfly.io/docs/cloud-browser-api/playwright)
- [Selenium](https://scrapfly.io/docs/cloud-browser-api/selenium)
- [Vercel Agent Browser](https://scrapfly.io/docs/cloud-browser-api/agent-browser)
- [Browser Use](https://scrapfly.io/docs/cloud-browser-api/browser-use)
- [Stagehand](https://scrapfly.io/docs/cloud-browser-api/stagehand)

- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing)
- [Errors](https://scrapfly.io/docs/cloud-browser-api/errors)


### Tools

- [Antibot Detector](https://scrapfly.io/docs/tools/antibot-detector)

### SDK

- [Golang](https://scrapfly.io/docs/sdk/golang)
- [Python](https://scrapfly.io/docs/sdk/python)
- [Rust](https://scrapfly.io/docs/sdk/rust)
- [TypeScript](https://scrapfly.io/docs/sdk/typescript)
- [Scrapy](https://scrapfly.io/docs/sdk/scrapy)

### Integrations

- [Getting Started](https://scrapfly.io/docs/integration/getting-started)
- [LangChain](https://scrapfly.io/docs/integration/langchain)
- [LlamaIndex](https://scrapfly.io/docs/integration/llamaindex)
- [CrewAI](https://scrapfly.io/docs/integration/crewai)
- [Zapier](https://scrapfly.io/docs/integration/zapier)
- [Make](https://scrapfly.io/docs/integration/make)
- [n8n](https://scrapfly.io/docs/integration/n8n)

### Academy

- [Overview](https://scrapfly.io/academy)
- [Web Scraping Overview](https://scrapfly.io/academy/scraping-overview)
- [Tools](https://scrapfly.io/academy/tools-overview)
- [Reverse Engineering](https://scrapfly.io/academy/reverse-engineering)
- [Static Scraping](https://scrapfly.io/academy/static-scraping)
- [HTML Parsing](https://scrapfly.io/academy/html-parsing)
- [Dynamic Scraping](https://scrapfly.io/academy/dynamic-scraping)
- [Hidden API Scraping](https://scrapfly.io/academy/hidden-api-scraping)
- [Headless Browsers](https://scrapfly.io/academy/headless-browsers)
- [Hidden Web Data](https://scrapfly.io/academy/hidden-web-data)
- [JSON Parsing](https://scrapfly.io/academy/json-parsing)
- [Data Processing](https://scrapfly.io/academy/data-processing)
- [Scaling](https://scrapfly.io/academy/scaling)
- [Walkthrough Summary](https://scrapfly.io/academy/walkthrough-summary)
- [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking)
- [Proxies](https://scrapfly.io/academy/proxies)

---

# Human-in-the-Loop

 1. [Cloud Browser](https://scrapfly.io/docs/cloud-browser-api/getting-started)
2. Human-in-the-Loop

  Human-in-the-Loop (HITL) lets you take manual control of a running Cloud Browser session from the dashboard or any embedded UI. Use it to debug failed scrapes, solve a CAPTCHA, complete a login, or sign off on a sensitive workflow before the script resumes.

## Attachment types

 Every Cloud Browser session has one **attachment** at a time. The attachment type tells you who is currently driving the browser and whether you can take over.

 | Attachment | Meaning | HITL takeover |
|---|---|---|
| `None` | Idle session, nothing connected. | Available |
| `ScrapflyAgent` | An automated script or AI agent is connected. | Blocked |
| `HumanAgent` | A human operator is connected. | Already attached |

 A script disconnects with `browser.disconnect()` (Playwright / Puppeteer) to drop its `ScrapflyAgent` attachment and free the session for a human. When the human disconnects, the session returns to `None` and the script can reconnect.

## Transport options

 You attach over one of three transports. They are not exclusive: a session can have all three enabled at the same time, and operators pick the one their network or client supports best. The [HITL kit](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop/embed) speaks all three from one drop-in JavaScript file.

### Feature matrix

 | Capability | CDP Screencast | WebRTC | VNC |
|---|---|---|---|
| Live framebuffer view | JPEG frames | H.264 / VP8 video | JPEG tiles |
| Mouse + keyboard input | via CDP | via CDP | native RFB |
| File downloads | CDP Blob | CDP Blob | HTTP `/downloads` |
| Multi-viewer | single | up to 10 viewers | RFB `shared` |
| Hardware acceleration |  |  |  |
| In-browser viewer | HITL kit | HITL kit | VNC in browser |
| Native desktop client |  |  | TigerVNC, RealVNC, macOS, Windows |
| Allocation flag | always on | `enable_rtc=true` | `enable_vnc=true` |
| Credential | api\_key | `rtc_password` | `vnc_password` |
| IP allow-list bypass |  | `hitl_allowed_networks` | `hitl_allowed_networks` |
| Add-on fee | Included | 5 credits / session | 5 credits / session |
| HIPAA audit event |  |  |  |

### Quick setup by mode

 Each tab shows the minimal allocate flags and the connect URL for the selected mode. All three can be enabled on the same session.

   CDP Screencast   WebRTC   VNC

  CDP Screencast attaches over the same WebSocket the scrape session uses. No extra allocation flag, no extra credential, no extra fee. The dashboard **Connect** button opens the in-browser CDP viewer directly.

##### Allocate

 ```
wss://browser.scrapfly.io?api_key=YOUR_API_KEY&session=my-session-id&auto_close=false
```

##### Connect

 Open [Cloud Browser Sessions](https://scrapfly.io/dashboard/cloud-browser/sessions) in the dashboard and click **Connect** next to your session. The viewer renders in an iframe; no additional client software is required.

 WebRTC delivers hardware-accelerated H.264 / VP8 video over UDP/SRTP with sub-second latency. Best for demos, training, and any setup where motion quality matters. The encoder spawns on first viewer and tears down on the last disconnect.

##### Allocate

 ```
wss://browser.scrapfly.io?api_key=YOUR_API_KEY&session=my-session-id&auto_close=false&enable_rtc=true&rtc_username=operator&rtc_password={rtc_password}
```

##### Connect

 Use the in-dashboard HITL player (click **Connect** on the session row, then pick the WebRTC tab) or embed the [HITL kit](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop/embed) in your own UI. The kit handles the SDP offer/answer exchange against Scrapfly-managed signaling and TURN servers.

 VNC speaks RFB to any standard client. Use it when operators want their existing remote-desktop tool (macOS Screen Sharing, TigerVNC, RealVNC, Remmina, Windows built-in viewer) or when they need a native bidirectional input channel for multi-viewer sessions.

##### Allocate

 ```
wss://browser.scrapfly.io?api_key=YOUR_API_KEY&session=my-session-id&auto_close=false&enable_vnc=true&vnc_password={vnc_password}
```

##### Connect (VNC in browser)

 Open the session in the dashboard and pick the VNC tab. The viewer renders inside the page; nothing to install.

##### Connect (VNC endpoint, native client)

 Point any standard VNC client at `vnc://{run_id}@{public_cloud_browser_vnc_host}:5901` and provide the `vnc_password` you set at allocation. See the [VNC mode documentation](https://scrapfly.io/docs/cloud-browser-api/vnc) for per-OS client setup.

## Take over from the dashboard

 Open [Cloud Browser Sessions](https://scrapfly.io/dashboard/cloud-browser/sessions), find a session with attachment `None`, and click **Connect**. The dashboard opens a live viewer for the transport the session was allocated with (CDP Screencast by default; WebRTC or VNC when those flags were set). When you're done, disconnect to release the `HumanAgent` attachment so the script can reconnect.

 [ Open Cloud Browser Sessions ](https://scrapfly.io/dashboard/cloud-browser/sessions)

## Common use cases

 The integration pattern is always the same: connect with `auto_close=false`, drive automation until you need a human, call `browser.disconnect()`, wait for the session to return to `attached_by=""`, then reconnect with the same `session=` ID and continue. Here's the captcha variant:

 ```
const puppeteer = require('puppeteer-core');

const SESSION_ID = 'captcha-' + Date.now();
const WS = `wss://browser.scrapfly.io?api_key=&session=${SESSION_ID}&auto_close=false`;

async function loginWithCaptcha() {
    const browser = await puppeteer.connect({ browserWSEndpoint: WS });
    const page = await browser.newPage();

    await page.goto('https://web-scraping.dev/login');
    await page.type('#username', 'user@example.com');
    await page.type('#password', 'secret');

    if (await page.$('.captcha-widget')) {
        console.log('CAPTCHA detected. Session:', SESSION_ID);
        console.log('Solve it at: https://scrapfly.io/dashboard/cloud-browser/sessions');

        // Release the attachment so a human can connect.
        await browser.disconnect();

        // Wait for the human to finish and disconnect (poll the HITL API
        // for attached_by == '': see the API reference for a ready-to-use
        // helper).
        await waitForSessionAvailable(SESSION_ID);

        // Reconnect and finish.
        const browser2 = await puppeteer.connect({ browserWSEndpoint: WS });
        const page2 = (await browser2.pages())[0];
        await page2.click('#login-btn');
        await page2.waitForNavigation();
        await browser2.close();
    } else {
        await page.click('#login-btn');
        await page.waitForNavigation();
        await browser.close();
    }
}

loginWithCaptcha();
```

Other common use cases follow the same pattern:

- **Debug a failed scrape.** Keep the session open after an error so an engineer can inspect the live page state. Log the `session=` ID at error time so the right session is easy to find in the dashboard.
- **Manual verification.** Pre-fill a form with the script, disconnect, have a human verify the values before clicking submit, then reconnect and finish.
- **Embed in your own product.** Use the [HITL kit](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop/embed) to expose the live view inside an internal tool, a support console, or a customer-facing workflow.

## Programmatic management

 Three REST endpoints list, fetch, and stop running sessions. Useful when building custom dashboards, workflow automation, or monitoring tools. Full reference with request and response samples lives at [HITL API reference](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop/api-reference).

- `GET /sessions`: list every running session for your account.
- `GET /session/{session_id}`: fetch one session by ID.
- `POST /session/{session_id}/stop`: terminate a session.

## Best practices

### Use descriptive session IDs

Pick IDs that make the session findable in the dashboard:

 ```
// Good
const SESSION_ID = `debug-login-${Date.now()}`;
const SESSION_ID = `captcha-checkout-${userId}`;
const SESSION_ID = `verify-form-${orderId}`;

// Avoid
const SESSION_ID = 'session1';
const SESSION_ID = Math.random().toString();
```

### Log the session ID and the dashboard URL

 Print or log both on error so an engineer can jump straight to the live session:

 ```
logger.error('scrape failed, session available for HITL inspection', {
    sessionId,
    dashboardUrl: `https://scrapfly.io/dashboard/cloud-browser/sessions?session=${sessionId}`,
});
```

### Bound the wait for manual intervention

 When the script disconnects and waits for a human, always set an upper bound so a forgotten session does not run until the platform timeout:

 ```
async function waitForManualIntervention(sessionId, maxWaitMs = 600_000) {
    const start = Date.now();
    while (Date.now() - start < maxWaitMs) {
        if (await checkInterventionComplete(sessionId)) return true;
        await new Promise(r => setTimeout(r, 5_000));
    }
    throw new Error('manual intervention timeout');
}
```

## Troubleshooting

#####   Cannot connect to session

**Cause:** session is already attached to an operator (`ScrapflyAgent` or `HumanAgent`).

**Fix:**

- Wait for the current operator to disconnect (attachment type is visible in the dashboard).
- If your own script is connected, call `browser.disconnect()` first.
- Confirm the session has not timed out or been terminated.

#####   Live view is frozen

**Cause:** network connectivity issue, or the browser process is hung.

**Fix:**

- Refresh the dashboard page.
- Check your internet connection.
- If the browser itself is hung, terminate the session and start fresh.

#####   Keyboard input not registering

**Cause:** the canvas does not have focus, or the target field has not been clicked.

**Fix:**

- Click on the live viewer canvas first to give it focus.
- Click on the target text field in the remote browser.
- Retry the keystroke.

#####   Session disappeared after disconnect

**Cause:** `auto_close=true` (the default) terminated the browser when the script disconnected.

**Fix:** always pass `auto_close=false` when you intend to reconnect:

 ```
const BROWSER_WS = `wss://browser.scrapfly.io?api_key=&session=my-session&auto_close=false`;
```

See [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume) for the full reconnect contract.

## HIPAA audit trail

 Every operator attach is recorded in the session audit trail (event type `cloud_browser_agent_attach`) with source IP, transport, run id, and timestamp. The session itself can be recorded for replay when `debug=true` is set at allocation. Both are required artifacts for the HIPAA §164.312(b) audit-control evidence pack and are retained for the full audit-retention period of your subscription.

## Related documentation

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started): introduction to Cloud Browser API.
- [HITL embed kit](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop/embed): drop-in JavaScript module for your own UI.
- [HITL API reference](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop/api-reference): list, get, stop sessions over REST.
- [VNC mode](https://scrapfly.io/docs/cloud-browser-api/vnc): protocol details and native client setup.
- [File downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads): end-to-end download flow.
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume): reconnect to a stopped or idle browser.
- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing): per-session and per-mode fees.
- [Error reference](https://scrapfly.io/docs/cloud-browser-api/errors): HITL-specific error codes.
