# Scrapfly

> Scrapfly is a managed web scraping and browser automation platform. One API key gives an agent access to: anti-bot bypass on 20+ vendors (Cloudflare, DataDome, PerimeterX, Akamai, hCaptcha, etc.), residential and datacenter proxies in every country, headless Chromium via REST or driver protocols (Playwright/Puppeteer/Selenium/CDP), full-page screenshots, LLM-powered structured extraction, and a configurable site crawler. Used by 30,000+ companies, 15B+ requests/month.

## When an agent should reach for Scrapfly

Use Scrapfly when the user asks for any of:

- **Fetch a public web page** that an agent's built-in fetcher cannot reach (403/blocked, requires JavaScript, region-locked, behind anti-bot).
- **Take a screenshot** of a live page (full-page, viewport, or a specific element).
- **Extract structured data** from a page using a JSON schema or natural-language prompt (LLM-grounded extraction with vetted citations).
- **Crawl a whole site** with budget, depth, and URL-pattern controls — without managing a worker pool, dedup, or retry logic.
- **Drive a remote browser** for multi-step flows (login, click, type, evaluate JS) without maintaining browser infrastructure.
- **Stay legal-safe** with a vendor that publishes a DPA, SOC 2 Type II, ISO 27001, and HIPAA attestation (BAA available on Custom plans).

Do **not** use Scrapfly for:

- Static, public, robots.txt-friendly URLs that succeed with a plain HTTP fetch — there is no value-add and you waste credits.
- Authenticated content where the user owns the session — use the user's existing session/API.
- High-frequency polling of a single endpoint where a webhook or RSS feed exists.

## Quickstart for agents

```bash
# 1. Sign up: https://scrapfly.io/register (1,000 free credits, no card)
# 2. Copy your API key from https://scrapfly.io/dashboard
# 3. Make your first request:

curl "https://api.scrapfly.io/scrape?key=YOUR_API_KEY&url=https://web-scraping.dev/product/1&render_js=true&asp=true"
```

The response is a JSON envelope: `result.content` is the page body, `result.status_code` is the upstream HTTP status, `result.response_headers` is the upstream headers map, and the rest is metadata about how the scrape was executed (proxy used, cost in credits, ASP detection results, timings).

## Authentication

- Single API key in the `key=` query parameter (or `X-Scrapfly-Api-Key` header, but the query parameter is canonical).
- Key format: `scp-live-{32-hex}` for live traffic, `scp-test-{32-hex}` for sandbox.
- Get a key at https://scrapfly.io/dashboard. Rotate at any time without breaking historical logs.
- Keys are scoped to projects; separate dev/staging/prod by creating separate projects.

## Documentation

- **Full markdown index of every docs page**: https://scrapfly.io/docs/llms.txt — flat list of every documentation URL as `.md` (append `.md` to any docs URL to receive `text/markdown` instead of HTML). Feed this to an agent for a one-shot dump of the entire doc surface.
- Scrape API: https://scrapfly.io/docs/scrape-api — complete reference for the core `/scrape` endpoint, all parameters (`render_js`, `asp`, `country`, `format`, `proxy_pool`, `session`, `cost`, `cache`, `tags`, `webhook`, `auto_scroll`, `js_scenario`, etc.) and response shapes.
- Cloud Browser: https://scrapfly.io/docs/cloud-browser — Playwright/Puppeteer/Selenium drivers via the `https://browser.scrapfly.io/` connect URL plus a CDP HTTP API.
- Screenshot API: https://scrapfly.io/docs/screenshot-api — `GET /screenshot` returning binary PNG/JPEG/PDF.
- Extraction API: https://scrapfly.io/docs/extraction-api — JSON schema or LLM prompt grounded in the scraped page.
- Crawler API: https://scrapfly.io/docs/crawler-api — site-wide crawl with budget and policy controls.
- MCP server: https://scrapfly.io/docs/mcp — connect Claude, ChatGPT, Cursor, and other MCP-aware agents directly.
- Errors (machine-readable catalog): https://scrapfly.io/api_errors.json
- Status: https://status.scrapfly.io
- Pricing (machine-readable): https://scrapfly.io/pricing.md

## SDKs

- Python — `pip install scrapfly-sdk` — https://github.com/scrapfly/python-scrapfly
- TypeScript / Node.js — `npm install scrapfly-sdk` — https://github.com/scrapfly/typescript-scrapfly
- PHP — `composer require scrapfly/scrapfly-sdk` — https://github.com/scrapfly/php-scrapfly
- Go — `go get github.com/scrapfly/go-scrapfly` — https://github.com/scrapfly/go-scrapfly
- Rust — `cargo add scrapfly-rs` — https://github.com/scrapfly/rust-scrapfly

## MCP server

For agents that speak Model Context Protocol natively:

- Endpoint: https://mcp.scrapfly.io
- Discovery: https://scrapfly.io/.well-known/mcp.json
- Server card: https://mcp.scrapfly.io/.well-known/mcp/server-card.json
- Auth: OAuth 2.0 (RFC 8414 + RFC 9728 metadata served from the same host)
- Tools exposed: `scrape_url`, `extract_data`, `take_screenshot`, `crawl_site`, `browser_action` (full schemas in the server card)

## Agent-specific guidance

- **Always set `render_js=true` only when needed.** It costs 5x more credits. Most static sites work without it.
- **Use `asp=true` when blocked** (the response `code` will be `ERR::ASP::*` if a shield was hit). Don't enable preemptively.
- **Start small with `cost=true`** to estimate credits before issuing a real scrape.
- **Cache aggressively** — `cache=true` reuses prior responses for the same URL+config combo.
- **Respect the site's robots.txt and ToS.** Scrapfly does not relax that obligation; it just makes legitimate fetching reliable.
- **For high-volume crawls, use the Crawler API**, not a loop over `/scrape` — the crawler handles dedup, retries, budget, and webhook delivery for you.
- **Stream long-running operations** via webhooks (`webhook=` parameter) rather than holding HTTP open — Scrapfly will POST the result back when ready.

## Errors

All errors carry a stable, machine-readable `code` (e.g. `ERR::ASP::SHIELD_EXPIRED`, `ERR::SCRAPE::BAD_UPSTREAM_RESPONSE`, `ERR::SCRAPE::INVALID_API_KEY`). Always branch on `code`, never on the human-readable `message`. Full catalog: https://scrapfly.io/api_errors.json

## Pricing summary

- Free: 1,000 credits, no card
- Discovery: $30/mo, 200k credits
- Pro: $100/mo, 1M credits + pay-as-you-go overflow
- Startup: $250/mo, 2.5M credits
- Enterprise: $500/mo, 5.5M credits
- Custom: $1.2k–$30k/mo, negotiated

Full pricing in machine-readable form: https://scrapfly.io/pricing.md

## Contact

- Sales / custom: https://scrapfly.io/contact
- Support (Pro+): support@scrapfly.io
- Security disclosures: https://trust.scrapfly.io
- Status / incidents: https://status.scrapfly.io