# Scrapfly Documentation

## Table of Contents

### Dashboard

- [Intro](https://scrapfly.io/docs)
- [Project](https://scrapfly.io/docs/project)
- [Account](https://scrapfly.io/docs/account)
- [Workspace & Team](https://scrapfly.io/docs/workspace-and-team)
- [Billing](https://scrapfly.io/docs/billing)

### Products

#### MCP Server

- [Getting Started](https://scrapfly.io/docs/mcp/getting-started)
- [Tools & API Spec](https://scrapfly.io/docs/mcp/tools)
- [Authentication](https://scrapfly.io/docs/mcp/authentication)
- [Examples & Use Cases](https://scrapfly.io/docs/mcp/examples)
- [FAQ](https://scrapfly.io/docs/mcp/faq)
##### Integrations

- [Overview](https://scrapfly.io/docs/mcp/integrations)
- [Claude Desktop](https://scrapfly.io/docs/mcp/integrations/claude-desktop)
- [Claude Code](https://scrapfly.io/docs/mcp/integrations/claude-code)
- [ChatGPT](https://scrapfly.io/docs/mcp/integrations/chatgpt)
- [Cursor](https://scrapfly.io/docs/mcp/integrations/cursor)
- [Cline](https://scrapfly.io/docs/mcp/integrations/cline)
- [Windsurf](https://scrapfly.io/docs/mcp/integrations/windsurf)
- [Zed](https://scrapfly.io/docs/mcp/integrations/zed)
- [Roo Code](https://scrapfly.io/docs/mcp/integrations/roo-code)
- [VS Code](https://scrapfly.io/docs/mcp/integrations/vscode)
- [LangChain](https://scrapfly.io/docs/mcp/integrations/langchain)
- [LlamaIndex](https://scrapfly.io/docs/mcp/integrations/llamaindex)
- [CrewAI](https://scrapfly.io/docs/mcp/integrations/crewai)
- [OpenAI](https://scrapfly.io/docs/mcp/integrations/openai)
- [n8n](https://scrapfly.io/docs/mcp/integrations/n8n)
- [Make](https://scrapfly.io/docs/mcp/integrations/make)
- [Zapier](https://scrapfly.io/docs/mcp/integrations/zapier)
- [Vapi AI](https://scrapfly.io/docs/mcp/integrations/vapi)
- [Agent Builder](https://scrapfly.io/docs/mcp/integrations/agent-builder)
- [Custom Client](https://scrapfly.io/docs/mcp/integrations/custom-client)


#### Web Scraping API

- [Getting Started](https://scrapfly.io/docs/scrape-api/getting-started)
- [API Specification]()
- [Monitoring](https://scrapfly.io/docs/monitoring)
- [Customize Request](https://scrapfly.io/docs/scrape-api/custom)
- [Debug](https://scrapfly.io/docs/scrape-api/debug)
- [Anti Scraping Protection](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)
- [Proxy](https://scrapfly.io/docs/scrape-api/proxy)
- [Proxy Mode](https://scrapfly.io/docs/scrape-api/proxy-mode)
- [Proxy Mode - Screaming Frog](https://scrapfly.io/docs/scrape-api/proxy-mode/screaming-frog)
- [Proxy Mode - Apify](https://scrapfly.io/docs/scrape-api/proxy-mode/apify)
- [(Auto) Data Extraction](https://scrapfly.io/docs/scrape-api/extraction)
- [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering)
- [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario)
- [SSL](https://scrapfly.io/docs/scrape-api/ssl)
- [DNS](https://scrapfly.io/docs/scrape-api/dns)
- [Cache](https://scrapfly.io/docs/scrape-api/cache)
- [Batch (Multi-URL Scraping)](https://scrapfly.io/docs/scrape-api/batch)
- [Session](https://scrapfly.io/docs/scrape-api/session)
- [Webhook](https://scrapfly.io/docs/scrape-api/webhook)
- [Schedule](https://scrapfly.io/docs/scrape-api/schedule)
- [Screenshot](https://scrapfly.io/docs/scrape-api/screenshot)
- [Errors](https://scrapfly.io/docs/scrape-api/errors)
- [Timeout](https://scrapfly.io/docs/scrape-api/understand-timeout)
- [Throttling](https://scrapfly.io/docs/throttling)
- [Troubleshoot](https://scrapfly.io/docs/scrape-api/troubleshoot)
- [Billing](https://scrapfly.io/docs/scrape-api/billing)
- [FAQ](https://scrapfly.io/docs/scrape-api/faq)

#### Crawler API

- [Getting Started](https://scrapfly.io/docs/crawler-api/getting-started)
- [API Specification]()
- [Retrieving Results](https://scrapfly.io/docs/crawler-api/results)
- [WARC Format](https://scrapfly.io/docs/crawler-api/warc-format)
- [Data Extraction](https://scrapfly.io/docs/crawler-api/extraction-rules)
- [Webhook](https://scrapfly.io/docs/crawler-api/webhook)
- [Schedule](https://scrapfly.io/docs/crawler-api/schedule)
- [Billing](https://scrapfly.io/docs/crawler-api/billing)
- [Errors](https://scrapfly.io/docs/crawler-api/errors)
- [Troubleshoot](https://scrapfly.io/docs/crawler-api/troubleshoot)
- [FAQ](https://scrapfly.io/docs/crawler-api/faq)

#### Screenshot API

- [Getting Started](https://scrapfly.io/docs/screenshot-api/getting-started)
- [API Specification]()
- [Accessibility Testing](https://scrapfly.io/docs/screenshot-api/accessibility)
- [Webhook](https://scrapfly.io/docs/screenshot-api/webhook)
- [Schedule](https://scrapfly.io/docs/screenshot-api/schedule)
- [Billing](https://scrapfly.io/docs/screenshot-api/billing)
- [Errors](https://scrapfly.io/docs/screenshot-api/errors)

#### Extraction API

- [Getting Started](https://scrapfly.io/docs/extraction-api/getting-started)
- [API Specification]()
- [Rules Template](https://scrapfly.io/docs/extraction-api/rules-and-template)
- [Saved Templates](https://scrapfly.io/docs/extraction-api/templates)
- [LLM Extraction](https://scrapfly.io/docs/extraction-api/llm-prompt)
- [AI Auto Extraction](https://scrapfly.io/docs/extraction-api/automatic-ai)
- [Webhook](https://scrapfly.io/docs/extraction-api/webhook)
- [Billing](https://scrapfly.io/docs/extraction-api/billing)
- [Errors](https://scrapfly.io/docs/extraction-api/errors)
- [FAQ](https://scrapfly.io/docs/extraction-api/faq)

#### Data API

- [Getting Started](https://scrapfly.io/docs/data-api/getting-started)

#### Proxy Saver

- [Getting Started](https://scrapfly.io/docs/proxy-saver/getting-started)
- [Fingerprints](https://scrapfly.io/docs/proxy-saver/fingerprints)
- [Optimizations](https://scrapfly.io/docs/proxy-saver/optimizations)
- [SSL Certificates](https://scrapfly.io/docs/proxy-saver/certificates)
- [Protocols](https://scrapfly.io/docs/proxy-saver/protocols)
- [Pacfile](https://scrapfly.io/docs/proxy-saver/pacfile)
- [Secure Credentials](https://scrapfly.io/docs/proxy-saver/security)
- [Billing](https://scrapfly.io/docs/proxy-saver/billing)

#### Cloud Browser API

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started)
- [Proxy & Geo-Targeting](https://scrapfly.io/docs/cloud-browser-api/proxy)
- [Unblock API](https://scrapfly.io/docs/cloud-browser-api/unblock)
- [Captcha Solver](https://scrapfly.io/docs/cloud-browser-api/captcha-solver)
- [File Downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads)
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume)
- [Human-in-the-Loop](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop)
- [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode)
- [Bring Your Own Proxy](https://scrapfly.io/docs/cloud-browser-api/bring-your-own-proxy)
- [Browser Extensions](https://scrapfly.io/docs/cloud-browser-api/extensions)
- [Native Browser MCP](https://scrapfly.io/docs/cloud-browser-api/mcp)
- [DevTools Protocol](https://scrapfly.io/docs/cloud-browser-api/cdp-reference)
##### Integrations

- [Puppeteer](https://scrapfly.io/docs/cloud-browser-api/puppeteer)
- [Playwright](https://scrapfly.io/docs/cloud-browser-api/playwright)
- [Selenium](https://scrapfly.io/docs/cloud-browser-api/selenium)
- [Vercel Agent Browser](https://scrapfly.io/docs/cloud-browser-api/agent-browser)
- [Browser Use](https://scrapfly.io/docs/cloud-browser-api/browser-use)
- [Stagehand](https://scrapfly.io/docs/cloud-browser-api/stagehand)

- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing)
- [Errors](https://scrapfly.io/docs/cloud-browser-api/errors)


### Tools

- [Antibot Detector](https://scrapfly.io/docs/tools/antibot-detector)

### SDK

- [Golang](https://scrapfly.io/docs/sdk/golang)
- [Python](https://scrapfly.io/docs/sdk/python)
- [Rust](https://scrapfly.io/docs/sdk/rust)
- [TypeScript](https://scrapfly.io/docs/sdk/typescript)
- [Scrapy](https://scrapfly.io/docs/sdk/scrapy)

### Integrations

- [Getting Started](https://scrapfly.io/docs/integration/getting-started)
- [LangChain](https://scrapfly.io/docs/integration/langchain)
- [LlamaIndex](https://scrapfly.io/docs/integration/llamaindex)
- [CrewAI](https://scrapfly.io/docs/integration/crewai)
- [Zapier](https://scrapfly.io/docs/integration/zapier)
- [Make](https://scrapfly.io/docs/integration/make)
- [n8n](https://scrapfly.io/docs/integration/n8n)

### Academy

- [Overview](https://scrapfly.io/academy)
- [Web Scraping Overview](https://scrapfly.io/academy/scraping-overview)
- [Tools](https://scrapfly.io/academy/tools-overview)
- [Reverse Engineering](https://scrapfly.io/academy/reverse-engineering)
- [Static Scraping](https://scrapfly.io/academy/static-scraping)
- [HTML Parsing](https://scrapfly.io/academy/html-parsing)
- [Dynamic Scraping](https://scrapfly.io/academy/dynamic-scraping)
- [Hidden API Scraping](https://scrapfly.io/academy/hidden-api-scraping)
- [Headless Browsers](https://scrapfly.io/academy/headless-browsers)
- [Hidden Web Data](https://scrapfly.io/academy/hidden-web-data)
- [JSON Parsing](https://scrapfly.io/academy/json-parsing)
- [Data Processing](https://scrapfly.io/academy/data-processing)
- [Scaling](https://scrapfly.io/academy/scaling)
- [Walkthrough Summary](https://scrapfly.io/academy/walkthrough-summary)
- [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking)
- [Proxies](https://scrapfly.io/academy/proxies)

---

#  Commands Reference

Every command returns a stable JSON envelope:

 ```
{ "success": true|false, "product": "...", "data": ... | "error": { "code": "...", "message": "..." } }
```

---

## scrape

Fetch a URL using Scrapfly's scraping infrastructure. Supports anti-bot bypass, JavaScript rendering, proxy rotation, and output format conversion.

 ```
scrapfly scrape <url> [flags]
```

 | Flag | Description |
|---|---|
| `--format, -f` | Output format: `raw` (default), `markdown`, `clean_html`, `text` |
| `--asp` | Enable anti-scraping protection bypass |
| `--render-js` | Enable JavaScript rendering with headless browser |
| `--proxy-pool` | Proxy pool name (e.g. `public_datacenter_pool`, `public_residential_pool`) |
| `--country` | Proxy country code (e.g. `us`, `de`) |
| `--header` | Request header as `key=value` (repeatable) |

---

## screenshot

Capture a full-page or viewport screenshot of any URL.

 ```
scrapfly screenshot <url> [flags]
```

 | Flag | Description |
|---|---|
| `-o, --output` | Write screenshot to file (default: base64 in JSON) |
| `--format` | Image format: `jpg`, `png`, `webp`, `gif` |
| `--capture` | Capture mode: `fullpage` or a CSS selector |
| `--resolution` | Viewport size (e.g. `1920x1080`) |
| `--country` | Proxy country code |

---

## extract

Extract structured data from a document using an AI prompt or named model. The document body is read from `--file` or stdin - use `scrapfly scrape --proxified` to pipe content in. For end-to-end fetch-and-extract, use `scrapfly scrape --extraction-prompt` instead.

 ```
# Pipe from scrape (Scrapfly fetches, then extracts)
scrapfly scrape https://web-scraping.dev/product/1 --proxified \
  | scrapfly extract --content-type text/html \
      --url https://web-scraping.dev/product/1 \
      --prompt "product name, price, sku"

# Extract from a local file
scrapfly extract --file page.html --content-type text/html --model product
```

 | Flag | Description |
|---|---|
| `--content-type` | Document content type, e.g. `text/html` (required) |
| `--prompt` | Fields to extract in plain English |
| `--model` | Named extraction model: `product`, `article`, `job_posting`, ... |
| `--url` | Source URL of the document (for extraction context) |
| `--file` | Read document body from file (default: stdin) |
| `--data-only` | Print extracted data only, no JSON envelope |

---

## scrape classify

Classify an HTTP response to detect which anti-bot systems are active. Use `--fetch` to have the CLI fetch the URL over plain HTTP and classify its response automatically. This is a subcommand of `scrape`.

 ```
scrapfly scrape classify --url <url> --fetch
```

 | Flag | Description |
|---|---|
| `--url` | URL the response came from (required) |
| `--fetch` | Fetch the URL over plain HTTP and use its response as input |
| `--body` | Response body inline (text) |
| `--file` | Read response body from file |
| `--status-code` | HTTP status code of the response (when piping body) |
| `--header` | Response header in `Name:Value` form (repeatable) |

---

## scrape batch

Scrape up to 100 URLs in a single streaming batch request. Supply URLs via `--url` (repeatable) or a newline-delimited file with `--url-file`. Results stream back as each scrape completes. This is a subcommand of `scrape`.

 ```
scrapfly scrape batch --url-file urls.txt [flags]
scrapfly scrape batch --url https://web-scraping.dev/product/1 --url https://web-scraping.dev/product/2 [flags]
```

 | Flag | Description |
|---|---|
| `--url` | URL to include (repeatable) |
| `--url-file` | Path to a newline-delimited file of URLs |
| `--asp` | Enable anti-scraping protection bypass for all URLs |
| `--render-js` | Enable JavaScript rendering for all URLs |
| `--country` | Proxy country for all URLs |

---

## crawl

Manage Scrapfly Crawler jobs. Use `crawl run` for synchronous execution (waits until done) or `crawl start` for async submission.

 ```
# Synchronous: submit and wait for completion
scrapfly crawl run https://web-scraping.dev --max-pages 20 --max-depth 2

# Async: submit and poll manually
scrapfly crawl start https://web-scraping.dev --max-pages 50
scrapfly crawl status <uuid>
scrapfly crawl urls <uuid> --status visited
scrapfly crawl contents <uuid>
```

 | Flag (run / start) | Description |
|---|---|
| `--max-pages` | Maximum pages to crawl |
| `--max-depth` | Maximum crawl depth |
| `--asp` | Enable anti-scraping protection bypass |
| `--country` | Proxy country for child scrapes |
| `--proxy-pool` | Proxy pool for child scrapes |
| `--content-format` | Content format for results: `html`, `markdown`, `text`, ... (repeatable) |

---

## browser

Control a remote Scrapfly Browser over CDP. Pass a URL with `--unblock` to navigate and apply anti-bot bypass. Without a URL, mints a raw CDP session URL.

 ```
# Unblock a URL and get an attach-ready browser session
scrapfly browser https://web-scraping.dev --unblock --country us

# Mint a fresh CDP URL (no target pre-navigated)
scrapfly browser --resolution 1920x1080 --pretty
```

---

## agent

Run an autonomous LLM agent that drives a Scrapfly Browser over CDP to accomplish a plain-English task. Requires an LLM provider API key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GEMINI_API_KEY`, or `OLLAMA_HOST`).

 ```
ANTHROPIC_API_KEY=sk-ant-... \
  scrapfly agent "collect the top 5 product names and prices" \
    --url https://web-scraping.dev/products
```

 | Flag | Description |
|---|---|
| `--url` | Navigate to this URL first (applies /unblock) |
| `--provider` | LLM provider: `anthropic`, `openai`, `gemini`, `ollama` (auto-detected from env) |
| `--model` | Model ID (provider-specific default) |
| `--max-steps` | Maximum tool-call loops (default 15) |
| `--schema` | JSON Schema for structured answer output |
| `--verbose` | Stream per-step traces to stderr |

---

## selector

Find a robust CSS selector for an element described in natural language. Uses an LLM to generate and verify the selector against the actual page HTML.

 ```
# Scrape a page and find a selector in one step
scrapfly selector "the first product price" \
  --url https://web-scraping.dev/products \
  --save-html products.html

# Find a selector in a previously saved file
scrapfly selector "add to cart button" --file products.html
```

 | Flag | Description |
|---|---|
| `--url` | Scrape this URL and use the result as input |
| `--file` | HTML input file (`-` for stdin) |
| `--want-text` | Require this substring in the matched element's text |
| `--save-html` | Write the fetched HTML to this path for reuse |
| `--provider` | LLM provider: `anthropic`, `openai`, `gemini`, `ollama` |
| `--attempts` | Max LLM retries when selector is not unique (default 4) |

---

## status

Print CLI version, authentication status, and account usage in a single call. Useful as a quick sanity check.

 ```
scrapfly status --pretty
```

---

## config

Persist the API key and host so you do not need to export them per session. The config file is stored at `~/.scrapfly/config.json` (mode 0600).

 ```
scrapfly config set-key scp-live-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
scrapfly config set-host https://api.scrapfly.io
scrapfly config view
scrapfly config clear
```

---

## account

Display account information - current plan, credit balance, and API key in use.

 ```
scrapfly account
```

---

## update

Update the CLI binary to the latest release from GitHub.

 ```
scrapfly update
```

---

## Global Flags

These flags are available on every command:

 | Flag | Description |
|---|---|
| `--api-key` | Override the API key for this invocation (overrides env and config file) |
| `--host` | Override the API host (overrides `SCRAPFLY_API_HOST` and config file) |
| `--pretty` | Print a one-line human-readable summary instead of JSON |
| `--timeout` | Per-request timeout (default: 150s) |
| `--insecure` | Skip TLS verification (dev stacks only) |
| `-o, --output` | Write primary payload to this file path |
| `-O, --output-dir` | Write primary payload into directory with auto-generated filename |
| `--version` | Print CLI version and exit |