     [Blog](https://scrapfly.io/blog)   /  [ai](https://scrapfly.io/blog/tag/ai)   /  [How to Build a Web Scraping Agent with Claude](https://scrapfly.io/blog/posts/how-to-build-a-web-scraping-agent-with-claude)   # How to Build a Web Scraping Agent with Claude

 by [Ziad Shamndy](https://scrapfly.io/blog/author/ziad) Jun 16, 2026 22 min read [\#ai](https://scrapfly.io/blog/tag/ai) [\#python](https://scrapfly.io/blog/tag/python) 

 [  ](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-web-scraping-agent-with-claude "Share on LinkedIn")    

 

 

         

   **AI Web Scraping API**AI-powered web scraping with intelligent data extraction and natural language queries.

 

 [ Learn More  ](https://scrapfly.io/ai-web-scraping-api) [  Docs ](https://scrapfly.io/docs/scrape-api/getting-started) 

 

 

Ask Claude to scrape a competitor page or documentation site and WebFetch often returns a Cloudflare challenge or just the navigation. You are not prompting it wrong. Roughly 4 in 10 SaaS sites block ClaudeBot at the edge by default.

This guide covers the three Claude surfaces for web scraping: Claude Code, the Anthropic API, and Claude Cowork. It explains where the native tools hit their ceiling and shows the fastest reliable recipe for getting data out of real sites.

## Key Takeaways

Learn how to build a web scraping agent with Claude that handles anti-bot protection, JavaScript rendering, and structured JSON output at any scale.

- Claude exposes three scraping surfaces: Claude Code (terminal agent), the Anthropic API (programmatic pipelines), and Claude Cowork (no-code browser extension aimed at non-developers)
- Claude's built-in WebFetch tool does not execute JavaScript and is blocked by Cloudflare on a large share of SaaS sites by default, making it unreliable for production scraping
- The fastest reliable setup is the Scrapfly CLI installed as a Claude Code skill: one curl install, one scaffold command, and then natural-language prompts handle the rest
- The `scrapfly agent` command runs a fully autonomous scraping loop with no skill setup required, which makes it useful for one-off tasks and rapid prototyping
- Adding `--asp` and `--render-js` to any Scrapfly CLI call handles Cloudflare, DataDome, and JavaScript-rendered SPAs without touching the prompt
- The Scrapfly CLI ships a built-in MCP mode via `scrapfly mcp serve` for teams already running Claude Desktop, Cursor, Cline, or Windsurf

**Get web scraping tips in your inbox**Trusted by 100K+ developers and 30K+ enterprises. Unsubscribe anytime.







## Why Use Claude for Web Scraping?

Claude turns natural-language descriptions into structured data extraction pipelines. Instead of writing XPath selectors or CSS patterns, you describe the fields you want and Claude infers the structure from the page.

Claude does not fetch pages. Claude interprets them. A reliable scraping pipeline splits the work into two distinct layers:

- **Fetch layer** handles JavaScript rendering, anti-bot challenges, and proxy routing, then returns clean markdown
- **Claude layer** reads that markdown, reasons over the structure, and returns structured JSON

Neither layer has to do the other's job. That separation is what makes the pattern reliable at scale.

### What Claude Specifically Brings

Three capabilities make Claude well-suited for the extraction layer:

- **HTML-to-JSON extraction** driven by natural-language field descriptions, no selectors needed
- **Multi-step browsing agents** through Claude Code for workflows that span multiple pages
- **Structured output via function calling** through the Anthropic API for deterministic Python pipelines

Claude's long context window also handles large HTML payloads well, which matters when passing an entire product listing as a single prompt.

[Finding Hidden Web Data with ChatGPT Web ScrapingIn this article we take a look at how to get assistance from LLMs for hidden web data scraping.](https://scrapfly.io/blog/posts/finding-hidden-web-data-with-chatgpt)



## What Are the Best Ways to Web Scrape with Claude?

Claude exposes three scraping surfaces. Claude Code is a terminal agent you install locally. The Anthropic API is a programmatic interface for Python pipelines. Claude Cowork is a no-code browser extension for non-developers.

| Claude Surface | Best For | Setup Complexity |
|---|---|---|
| Claude Code | Developers who want natural-language ad-hoc scraping and repeatable skill-based workflows | Low (one CLI install) |
| Anthropic API | Backend engineers building Python pipelines and scheduled extraction jobs | Medium (SDK + code) |
| Claude Cowork | Non-developer users browsing sites in Chrome with connector flows | None (out of scope for this guide) |

### Claude Code

Claude Code is Anthropic's CLI agent. Claude Code reads files, runs commands, and triggers Claude Code Skills based on semantic matching of your prompt against each skill's description.

The fastest scraping path is installing the Scrapfly CLI and scaffolding a skill that tells Claude Code when and how to use the CLI. This guide covers that setup in the next section.

### Anthropic API

For backend services and scheduled extraction jobs, call the Anthropic API directly. Shell out to the Scrapfly CLI for the fetch, pipe the markdown back into `client.messages.create()`, and use Claude's tool use feature for parsing.

This produces deterministic JSON extraction inside any Python pipeline without requiring Claude Code or a running terminal agent. Section 7 covers this path in full.

### Claude Cowork

Claude Cowork is the in-Chrome connector flow where Claude reads pages you are currently browsing and optionally calls third-party connectors to extract structured data.

Claude Cowork is a marketer or researcher tool, not a developer recipe. If that connector flow fits your job, the [Anthropic connector documentation](https://claude.com/docs/connectors/overview) is the right starting point.



## Why Do Claude's WebFetch and Web Search Tools Fail on Real Sites?

Claude's WebFetch tool performs an HTTP GET request and retrieves the initial HTML response. WebFetch does not execute JavaScript, does not solve anti-bot challenges, and does not rotate proxies.

Any site protected by Cloudflare or DataDome, or built on a client-side rendering framework, returns either a challenge page or an empty HTML shell. The web search tool inherits the same limitation.

The four failure modes most developers run into are:

- **Cloudflare blocks ClaudeBot at the edge.** Many SaaS sites block ClaudeBot by default. The symptom is a silent 403 with no obvious cause in the model's output.
- **JavaScript SPAs return an empty shell.** WebFetch does an HTTP GET. If content loads client-side, WebFetch never sees it.
- **Silent partial extraction on SSR pages.** WebFetch sometimes returns only nav and sidebar and presents it as the full page, with no error.
- **Anti-bot systems beyond Cloudflare.** DataDome, Akamai, PerimeterX, and Kasada all return challenge HTML. Native Claude tools never solve those challenges.

Here is what a WebFetch call looks like against a Cloudflare-protected target using the Anthropic API:

python```python
import anthropic

client = anthropic.Anthropic(api_key="YOUR_ANTHROPIC_API_KEY")

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{
        "role": "user",
        "content": "Fetch https://web-scraping.dev/products and list every product name on the page."
    }]
)
# Result: challenge page HTML or empty body - no product data is returned
print(response.content)
```



The model calls the tool and returns whatever the server sends back, including challenge pages. There is no indication the page was blocked. The response appears complete even when the fetch failed.

The fix is not a better prompt. The fix is a fetch layer that handles JavaScript rendering and anti-bot challenges before the page reaches Claude. That is what the Scrapfly CLI does in the next section.

[How to Bypass Cloudflare When Web Scraping in 2026Cloudflare offers one of the most popular anti scraping service, so in this article we'll take a look how it works and how to bypass it.](https://scrapfly.io/blog/posts/how-to-bypass-cloudflare-anti-scraping)



## How to Set Up Scrapfly as a Claude Code Skill

Scrapfly's [Web Scraping API](https://scrapfly.io/web-scraping-api) is a single HTTP endpoint for collecting web data at scale, with a **99.99% success rate** across **130M+ proxies in 120+ countries**.

- [Anti-Scraping Protection bypass](https://scrapfly.io/docs/scrape-api/anti-scraping-protection) - automatically defeats Cloudflare, DataDome, PerimeterX, Akamai, and 90+ other bot systems.
- [Smart proxy rotation](https://scrapfly.io/docs/scrape-api/proxy) - residential and datacenter pools with country and ASN level geo-targeting.
- [JavaScript rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering) - render SPAs and dynamic pages through real cloud browsers.
- [Browser automation scenarios](https://scrapfly.io/docs/scrape-api/javascript-scenario) - scroll, click, fill forms, and wait for elements without managing a browser fleet.
- [Format conversion](https://scrapfly.io/docs/scrape-api/getting-started#api_param_format) - return pages as HTML, JSON, clean text, or LLM ready Markdown.
- [Session management](https://scrapfly.io/docs/scrape-api/session) - keep cookies, headers, and IPs consistent across multi step flows.
- [Smart caching](https://scrapfly.io/docs/scrape-api/getting-started#api_param_cache) - cache successful responses to cut cost on repeat scraping jobs.
- [Python](https://scrapfly.io/docs/sdk/python), [TypeScript](https://scrapfly.io/docs/sdk/typescript), [Scrapy](https://scrapfly.io/docs/sdk/scrapy), and [no-code integrations](https://scrapfly.io/docs/integration/getting-started) including Make, n8n, Zapier, LangChain, and LlamaIndex.

Install the Scrapfly CLI with one curl command and scaffold a Claude Code skill that tells Claude Code when and how to use the CLI. After that, natural-language prompts trigger the CLI automatically.

### Install the CLI in one command

shell```shell
curl -fsSL https://scrapfly.io/scrapfly-cli/install | sh
scrapfly config set-key YOUR_SCRAPFLY_API_KEY
```



The installer works on macOS and Linux. Windows users can download release artifacts from the GitHub repository or install via npm with `npm install -D scrapfly-cli`.

A free API key is available at [scrapfly.io/register](https://scrapfly.io/register) with 1,000 free credits and no credit card required. For auth configuration and platform-specific install paths, see the [CLI documentation](https://scrapfly.io/docs/cli).

Every CLI command returns a stable JSON envelope with the shape `{success, product, data | error}`. Run a quick test against the Scrapfly demo site to see the envelope:

shell```shell
scrapfly scrape https://web-scraping.dev/products --format markdown --pretty
```



The response looks like this (truncated for readability):

json```json
{
  "success": true,
  "product": "scrape",
  "data": {
    "content": "# Products\n\n## Box of Chocolate Candy\n**Price:** $24.99\n\n## Red Potion\n**Price:** $4.99\n...",
    "status_code": 200,
    "url": "https://web-scraping.dev/products"
  }
}
```



The `--content-only` flag strips the envelope and returns just the page markdown, which is cleaner for piping into other tools. The `--data-only` flag returns just the envelope metadata without page content.

### Scaffold the skill with skill-creator

Claude Code's official `skill-creator` skill scaffolds the skill folder and a SKILL.md file with the correct YAML frontmatter. Run `skill-creator` inside Claude Code once and edit the generated SKILL.md to point at the Scrapfly CLI commands.

Skills live in `~/.claude/skills/scrapfly/` for user-level availability across all projects. Project-scoped skills go in `.claude/skills/scrapfly/` at the project root and can be committed to git.

The SKILL.md needs YAML frontmatter with a `name` and a `description`. The description is what Claude Code uses for semantic-match triggering.

A description like "Scrape a URL, take a screenshot, extract structured data, or run an autonomous scraping agent using the Scrapfly CLI" tells Claude Code exactly when to activate the skill.

The skill folder structure looks like this:

```
~/.claude/skills/scrapfly/
├── SKILL.md          # frontmatter + instructions for Claude Code
└── README.md         # optional human-readable notes
```



A minimal SKILL.md frontmatter:

yaml```yaml
---
name: scrapfly
description: >
  Scrape a URL, take a screenshot, extract structured data,
  or run an autonomous scraping agent using the Scrapfly CLI.
  Use this skill whenever the task involves fetching web content,
  handling Cloudflare or anti-bot protection, or rendering JavaScript-heavy pages.
---
```



The body of the SKILL.md describes when to use `scrapfly scrape`, `scrapfly browser`, and `scrapfly agent`, along with how to parse the JSON envelope.

The official `scrapfly-cli/SKILL.md` in the Scrapfly skills GitHub repository covers all of this in full. Fetch it once and use it as the starting point for your skill:

shell```shell
curl -fsSL \
  https://raw.githubusercontent.com/scrapfly/skills/main/scrapfly-cli/SKILL.md \
  > ~/.claude/skills/scrapfly/SKILL.md
```



The full Scrapfly CLI repository is at [github.com/scrapfly/scrapfly-cli](https://github.com/scrapfly/scrapfly-cli).

### Trigger the skill from a natural-language prompt

Open Claude Code and prompt naturally:

```
Scrape the product listings at web-scraping.dev/products
and return the top 5 product names and prices as JSON.
```



Claude Code reads the skill description, semantically matches the task to the Scrapfly skill, calls the CLI, parses the JSON envelope, and returns structured data. The round trip looks like this:

1. Claude Code matches the prompt to the scrapfly skill based on the description.
2. Claude Code calls `scrapfly scrape https://web-scraping.dev/products --format markdown`.
3. The CLI returns the stable JSON envelope with the rendered markdown in `data.content`.
4. Claude Code reads the content field, extracts the five products, and formats the result as JSON.

The whole loop runs without writing a single line of scraping code. Here is what that looks like in practice:

That is the full setup. One install, one skill file, and Claude Code handles the rest through natural language. The next sections show what you can do once the skill is in place.

### Power your scraping with Scrapfly

Forget about getting blocked. Scrapfly handles anti-bot bypasses, browser rendering, and proxy rotation so you can focus on the data.



[Try for FREE!](https://scrapfly.io/register)





## How to Run an Autonomous Claude Scraping Agent with the Scrapfly CLI

The `scrapfly agent` command runs a fully autonomous scraping loop. Claude plans the scrape, calls built-in browser-agent tools, and returns a structured result. No skill setup is required.

The available tools include `open`, `snapshot`, `click`, `type`, `scroll`, `screenshot`, `eval`, `unblock`, `find_selector`, and `done`. The only dependency is an Anthropic API key in the environment.

shell```shell
ANTHROPIC_API_KEY=your-key scrapfly agent \
  "Name and price of the first product on the page" \
  --url https://web-scraping.dev/products \
  --schema '{"name": "string", "price": "string"}'
```



The agent auto-detects the LLM provider from environment variables in this order: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GEMINI_API_KEY` or `GOOGLE_API_KEY`, then `OLLAMA_HOST`. To pin a specific provider, pass `--provider anthropic`.

The agent uses an AXTree-based observation format, an accessibility-tree view of the page rather than raw HTML. This keeps the token budget small across multi-step planning.

A typical run produces output like this:

```
[scrapfly agent] Opening https://web-scraping.dev/products
[scrapfly agent] Taking page snapshot
[scrapfly agent] Identifying first product listing in AXTree
[scrapfly agent] Extracting: name = "Box of Chocolate Candy"
[scrapfly agent] Extracting: price = "$24.99"
[scrapfly agent] Task complete

{
  "name": "Box of Chocolate Candy",
  "price": "$24.99"
}
```



Agent mode fits brand monitoring, competitor price tracking, AI-visibility checks, and lightweight competitive research. It is useful when setting up a full skill is not worth the overhead.

Developers on r/ClaudeCode also reach for this pattern when scraping Google Maps listings, YouTube channel data, job boards, and hotel pricing pages for one-time exports.

For repeatable workflows, prefer the skill setup from section 4 so Claude Code can be triggered consistently without re-specifying the schema and URL each time.



## How to Scrape Cloudflare-Protected Sites with Claude and Scrapfly ASP

If you have seen Claude Code retry a URL three times, suggest a different URL, and then give up with no data, you have almost certainly hit the ClaudeBot block.

Cloudflare's "AI Scrapers and Crawlers" policy blocks ClaudeBot at the edge before any content reaches the fetch tool. The Scrapfly CLI's `--asp` flag bypasses these challenges server-side so Claude Code only ever receives clean markdown.

The canonical flag combination for Cloudflare-protected and anti-bot-protected sites:

shell```shell
scrapfly scrape https://web-scraping.dev/antibot \
  --asp \
  --render-js \
  --proxy-pool public_residential_pool \
  --country us \
  --format markdown \
  --pretty
```



Running this against the Scrapfly demo anti-bot endpoint returns a successful response:

json```json
{
  "success": true,
  "product": "scrape",
  "data": {
    "content": "# Anti-Bot Demo\n\n## Products\n\n### Box of Chocolate Candy\n**Price:** $24.99\n\n### Red Potion\n**Price:** $4.99\n...",
    "status_code": 200,
    "url": "https://web-scraping.dev/antibot"
  }
}
```



What each flag does in practice:

- `--asp` activates Scrapfly's full Anti-Scraping Protection stack. This includes TLS fingerprinting, HTTP/2 SETTINGS frames, behavioral signals, and server-side challenge solving across Cloudflare, DataDome, Akamai, Kasada, and PerimeterX. Failed challenges retry server-side without consuming API credits. This is the layer that resolves ClaudeBot blocks.
- `--proxy-pool public_residential_pool` routes the request through real ISP IP addresses across 120+ countries. Residential IPs survive detection patterns that flag datacenter IP ranges and known Anthropic IP blocks.
- `--render-js` launches Scrapfly's stealth Chromium browser with 30,000+ spoofed browser signals. Scrapfly renders React, Vue, and Angular SPAs fully before returning the page. This resolves the empty SPA shell failure mode from section 3 in one flag.

All three flags engage from a single CLI call. Claude Code, triggered through the Scrapfly skill, never sees the anti-bot complexity. Claude Code receives clean markdown every time.

Start each scrape without anti-bot flags and add them only when the response signals a problem. Plain scrapes cost fewer credits. The `--asp` and `--render-js` flags carry a credit multiplier.

[How to Bypass Cloudflare When Web Scraping in 2026Cloudflare offers one of the most popular anti scraping service, so in this article we'll take a look how it works and how to bypass it.](https://scrapfly.io/blog/posts/how-to-bypass-cloudflare-anti-scraping)



Scrapfly

#### Scale your web scraping effortlessly

Scrapfly handles proxies, browsers, and anti-bot bypass — so you can focus on data.

[Try Free →](https://scrapfly.io/register)## How to Extract Structured JSON with Claude via API

For Python pipelines and backend jobs, shell out to the Scrapfly CLI for the fetch, then pipe the markdown into `client.messages.create()` with a structured-output schema.

This pattern combines Scrapfly's fetch layer with Claude's tool use feature without requiring Claude Code or any skill setup. The result is deterministic JSON extraction inside any Python script.

python```python
import subprocess
import json
import anthropic

# Step 1: Fetch the page as clean markdown using the Scrapfly CLI
result = subprocess.run(
    [
        "scrapfly", "scrape",
        "https://web-scraping.dev/product/1",
        "--asp", "--render-js",
        "--format", "markdown",
        "--content-only"
    ],
    capture_output=True,
    text=True
)
markdown = result.stdout

# Step 2: Define the extraction schema and call Claude
client = anthropic.Anthropic(api_key="your_api_key_here")
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[{
        "name": "extract_product",
        "description": "Extract product data from the page markdown",
        "input_schema": {
            "type": "object",
            "properties": {
                "name":     {"type": "string",  "description": "Product name"},
                "price":    {"type": "string",  "description": "Product price with currency symbol"},
                "in_stock": {"type": "boolean", "description": "Whether the product is in stock"}
            },
            "required": ["name", "price", "in_stock"]
        }
    }],
    tool_choice={"type": "tool", "name": "extract_product"},
    messages=[{
        "role": "user",
        "content": f"Extract the product details from this page:\n\n{markdown}"
    }]
)

print(json.dumps(response.content[0].input, indent=2))

```





 Output{ "name": "Box of Chocolate Candy", "price": "$24.99", "in\_stock": true }





The `--content-only` flag returns raw markdown without the JSON envelope, which is cleaner for direct pipe into a Python string.

Scrapfly returns LLM-ready markdown, not raw HTML, so no BeautifulSoup step is needed between the fetch and the Claude call.

**Handling large pages and Claude's context window:**

- Start with `--format markdown --content-only` to return compact markdown instead of raw HTML. Markdown is dramatically more compact.
- If the markdown is still too large, add `--only-main-content` to strip navigation, footer, and sidebar before the content reaches Claude.
- For very large documents, use Scrapfly's `--extraction-prompt` flag to push parsing server-side. Claude then receives only the structured result rather than the full page.
- For high-throughput pipelines, replace the subprocess shell-out with the [Scrapfly Python SDK](https://scrapfly.io/docs/sdk). The SDK uses the same flags and returns the same JSON envelope but with lower overhead and built-in retry logic.

[LangChain Web Scraping: Build AI Agents &amp; RAG ApplicationsLearn to integrate LangChain with Scrapfly for web scraping. Build AI agents and RAG applications that extract, process, and understand web data at scale.](https://scrapfly.io/blog/posts/langchain-web-scraping-complete-guide-scrapfly)



## How to Scrape Multi-Step Sites with a Persistent Browser Session

For sites with public multi-step flows such as pagination, filter dropdowns, or multi-page form submissions, the Scrapfly CLI's persistent browser daemon keeps a real Chromium session alive across CLI calls.

Cookies and browser state survive between commands. Claude Code can navigate, fill, and click across multiple steps without losing session context.

Start the daemon with a named session, navigate to the target, interact with the page, and dump the rendered HTML:

shell```shell
# Start the persistent browser session
scrapfly browser --session scraping-demo start

# Navigate to the first page of a paginated product listing
scrapfly browser --session scraping-demo open https://web-scraping.dev/products?page=1

# Scroll down to trigger lazy-loaded content
scrapfly browser --session scraping-demo scroll

# Navigate to the next page while preserving session state
scrapfly browser --session scraping-demo open https://web-scraping.dev/products?page=2

# Dump the fully rendered HTML with iframes inlined
scrapfly browser --session scraping-demo content --raw

# Close the session when the workflow is complete
scrapfly browser --session scraping-demo close
```



The daemon uses an anti-bot CDP domain with human-like timing on Scrapfly's custom browser. The `--wpm` flag controls typing speed when using the `fill` command for form inputs.

The `browser content --raw` command dumps the fully rendered HTML with all iframes inlined. That is the format you pass to Claude Code for extraction.

When extending your SKILL.md to support browser sessions, describe when Claude Code should use `scrapfly browser` versus `scrapfly scrape`. Claude Code picks the right path based on the task description.

A prompt like "paginate through all products across every page and collect every name and price" naturally triggers the browser session path rather than a one-shot scrape.



## How to Add Scrapfly to Claude Desktop with MCP

If your team is already on the MCP ecosystem through Claude Desktop, Cursor, Cline, or Windsurf, the Scrapfly CLI ships a built-in MCP mode.

Running `scrapfly mcp serve` exposes `scrape`, `screenshot`, `extract`, `crawl_run`, and `selector` as MCP tools over stdio. The install and auth steps are the same as section 4, so no additional setup is needed beyond the config block below.

Add this to your Claude Desktop MCP configuration file:

json```json
{
  "mcpServers": {
    "scrapfly": {
      "command": "scrapfly",
      "args": ["mcp", "serve"],
      "env": {
        "SCRAPFLY_API_KEY": "YOUR_SCRAPFLY_API_KEY"
      }
    }
  }
}
```



MCP loads all tool definitions upfront at client startup. Skills load only the name and description at startup, then fetch the full SKILL.md content on demand when a task matches.

Typical multi-server MCP setups can push around 50,000 tokens of tool definitions into the context before any user message arrives. Skills sidestep that cost.

Both approaches are valid. Choose based on which delivery mode is already in your client config.

For teams who want hosted MCP without managing a local binary, Scrapfly's cloud-hosted option handles the server side entirely. See [Scrapfly MCP Cloud](https://scrapfly.io/products/mcp-cloud).

[For an introduction to the Model Context Protocol before configuring the MCP server, see the MCP explainer.](https://scrapfly.io/blog/posts/what-is-mcp-understanding-the-model-context-protocol)



## Which Claude Scraping Method Should You Choose?

Match the method to the job. Use Claude Code with the Scrapfly skill for natural-language, ad-hoc scraping and small repeatable workflows.

Use the Anthropic API with the Scrapfly CLI for deterministic Python pipelines with fixed schemas. Use `scrapfly agent` for one-off autonomous tasks. Use `scrapfly mcp serve` if your stack is already MCP-native.

| Method | Best For | Handles JS | Handles Cloudflare | Public Multi-Step Flows |
|---|---|---|---|---|
| Claude Code + Scrapfly Skill | Ad-hoc scraping, repeatable natural-language workflows | Yes, via `--render-js` | Yes, via `--asp` | Yes, with `scrapfly browser` |
| Anthropic API + Scrapfly CLI | Python pipelines, scheduled jobs, fixed extraction schemas | Yes, via `--render-js` | Yes, via `--asp` | Yes, via subprocess |
| `scrapfly agent` one-liner | Prototyping, one-off autonomous tasks | Yes, built-in | Yes, built-in | Yes, via agent planning |
| Scrapfly MCP | Teams already on Claude Desktop, Cursor, or Cline | Yes | Yes | Yes |
| Claude Cowork | Non-developer browsing flows, marketer use cases | No | No | No |

All methods that use Scrapfly under the hood handle JavaScript rendering, Cloudflare, and DataDome. Claude Cowork and the raw Anthropic API without a Scrapfly fetch layer do not.

The `scrapfly agent` one-liner is the lowest barrier to entry for a first scrape. The Claude Code skill is the most flexible for ongoing work.

The Anthropic API pattern is the right call when the extraction logic belongs in a Python codebase and not in a terminal session.



## FAQ

What can I actually scrape with Claude and Scrapfly?Most public web targets are reachable. Common targets from r/ClaudeCode include Google Maps, job boards, hotel pricing, product catalogs, and documentation sites. If the page is publicly accessible, the Scrapfly CLI with `--asp` handles it and Claude returns structured JSON.







What is a Claude Code skill?```
 A Claude Code skill is a SKILL.md file with YAML frontmatter. Claude Code matches your prompt against every installed skill's description and triggers the right one. The SKILL.md body tells Claude Code which commands to run.
```









When should I use MCP instead of the Scrapfly CLI skill for Claude web scraping?Use MCP if you already run a multi-server MCP config in Claude Desktop or Cursor. Use the skill to keep context overhead low, since MCP loads all tool definitions upfront. Both provide the same Scrapfly fetch layer.







How do I handle large HTML pages that exceed Claude's context window?```
 Start with --format markdown --content-only to return compact markdown instead of raw HTML. Add --only-main-content if still too large. For very large documents, use --extraction-prompt to push extraction server-side so Claude receives only the structured result.
```









Can I swap Claude for Gemini, GPT-4, or a local model?Yes, for the `scrapfly agent` command. The agent auto-detects the LLM provider from environment variables: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GEMINI_API_KEY`, or `OLLAMA_HOST`. Pass `--provider` to pin a specific one. The Scrapfly fetch layer works with any downstream LLM.







How much does it cost to run Claude with Scrapfly per scrape?Anthropic charges per token for input and output. Scrapfly charges per credit: plain scrapes cost one credit, while `--asp` and `--render-js` carry a multiplier. See [scrapfly.io/pricing](https://scrapfly.io/pricing) for the breakdown. Most single-page tasks stay under a few cents.









## Conclusion

The core recipe is straightforward. Install the Scrapfly CLI, scaffold a Claude Code skill, and prompt Claude in natural language. Scrapfly handles the fetch layer and Claude handles extraction.

For one-off tasks, `scrapfly agent` runs the whole loop autonomously with no skill required. For Python pipelines with a fixed schema, the Anthropic API combined with a CLI subprocess call gives you deterministic extraction inside any backend service.

If your stack is already MCP-native, `scrapfly mcp serve` wires the same tools into Claude Desktop, Cursor, or Cline without any skill scaffolding.

If you are a non-developer user who wants to extract data while browsing, Claude Cowork's connector flow is the better-suited path for that job.

The Scrapfly CLI is open source on GitHub at [github.com/scrapfly/scrapfly-cli](https://github.com/scrapfly/scrapfly-cli). The platform offers 1,000 free credits at signup with no credit card required at [scrapfly.io/register](https://scrapfly.io/register).



 

   Table of Contents















 

  Table of Contents- [Key Takeaways](#key-takeaways)
- [Why Use Claude for Web Scraping?](#why-use-claude-for-web-scraping)
- [What Claude Specifically Brings](#what-claude-specifically-brings)
- [What Are the Best Ways to Web Scrape with Claude?](#what-are-the-best-ways-to-web-scrape-with-claude)
- [Claude Code](#claude-code)
- [Anthropic API](#anthropic-api)
- [Claude Cowork](#claude-cowork)
- [Why Do Claude's WebFetch and Web Search Tools Fail on Real Sites?](#why-do-claude-s-webfetch-and-web-search-tools-fail-on-real-sites)
- [How to Set Up Scrapfly as a Claude Code Skill](#how-to-set-up-scrapfly-as-a-claude-code-skill)
- [Install the CLI in one command](#install-the-cli-in-one-command)
- [Scaffold the skill with skill-creator](#scaffold-the-skill-with-skill-creator)
- [Trigger the skill from a natural-language prompt](#trigger-the-skill-from-a-natural-language-prompt)
- [Power your scraping with Scrapfly](#power-your-scraping-with-scrapfly)
- [How to Run an Autonomous Claude Scraping Agent with the Scrapfly CLI](#how-to-run-an-autonomous-claude-scraping-agent-with-the-scrapfly-cli)
- [How to Scrape Cloudflare-Protected Sites with Claude and Scrapfly ASP](#how-to-scrape-cloudflare-protected-sites-with-claude-and-scrapfly-asp)
- [How to Extract Structured JSON with Claude via API](#how-to-extract-structured-json-with-claude-via-api)
- [How to Scrape Multi-Step Sites with a Persistent Browser Session](#how-to-scrape-multi-step-sites-with-a-persistent-browser-session)
- [How to Add Scrapfly to Claude Desktop with MCP](#how-to-add-scrapfly-to-claude-desktop-with-mcp)
- [Which Claude Scraping Method Should You Choose?](#which-claude-scraping-method-should-you-choose)
- [FAQ](#faq)
- [Conclusion](#conclusion)
 
    Join the Newsletter  Get monthly web scraping insights 

 

  



Scale Your Web Scraping

Anti-bot bypass, browser rendering, and rotating proxies, all in one API. Start with 1,000 free credits.

  No credit card required  1,000 free API credits  Anti-bot bypass included 

 [Start Free](https://scrapfly.io/register) [View Docs](https://scrapfly.io/docs/onboarding) 

 Not ready? Get our newsletter instead. 

 

## Explore this Article with AI

 [ ChatGPT ](https://chat.openai.com/?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-web-scraping-agent-with-claude) [ Gemini ](https://www.google.com/search?udm=50&aep=11&q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-web-scraping-agent-with-claude) [ Grok ](https://x.com/i/grok?text=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-web-scraping-agent-with-claude) [ Perplexity ](https://www.perplexity.ai/search/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-web-scraping-agent-with-claude) [ Claude ](https://claude.ai/new?q=Summarize%20this%20page%3A%20https%3A%2F%2Fscrapfly.io%2Fblog%2Fposts%2Fhow-to-build-a-web-scraping-agent-with-claude) 



 ## Related Articles

 [     

 python api 

### How to Build a Web Scraping Agent with Gemini

Build a Gemini web scraping agent that works on real sites. Covers Gemini CLI skills, URL Context limits, Python pipelin...

 

 ](https://scrapfly.io/blog/posts/gemini-for-webscraping) [  

 http python 

### Web Scraping with Python

Introduction tutorial to web scraping with Python. How to collect and parse public data. Challenges, best practices and ...

 

 ](https://scrapfly.io/blog/posts/web-scraping-with-python) [  

 python 

### Everything to Know to Start Web Scraping in Python Today

Complete introduction to web scraping using Python: http, parsing, AI, scaling and deployment.

 

 ](https://scrapfly.io/blog/posts/everything-to-know-about-web-scraping-python) 

  



   



 Scale your web scraping effortlessly, **1,000 free credits** [Start Free](https://scrapfly.io/register)