MCP Tools & API Specification

View as markdown

The Scrapfly MCP Server provides 5 powerful tools covering 99% of web scraping use cases-from quick page fetches to advanced browser automation.

Pro Tip: Always call scraping_instruction_enhanced first to get best practices and understand the pow (proof of work) parameter required by scraping tools.

Tools Overview

scraping_instruction_enhanced

Returns critical instructions and best practices for using Scrapfly scraping tools. Helps AI models make intelligent decisions about which parameters to use.

Important: Call this tool before using web_get_page or web_scrape. It provides the required pow parameter and parameter guidance.

Provides:

Parameter guidance - When to use which options
Best practices - Optimize for success rate and cost

Error handling - What to do when things fail
POW value - Required proof of work parameter

Example Usage

{ "tool": "scraping_instruction_enhanced" }

web_get_page

Quick page fetch with sane defaults. Perfect for when you just need the content fast without complex configuration.

What it does automatically:

Renders JavaScript by default
Returns clean markdown or text content

Handles anti-scraping protection
Uses optimal defaults for most websites

Parameter	Type	Description
Required Parameters
`url`	string	Target URL to scrape (must start with http:// or https://)
`pow`	string	Proof of work value from `scraping_instruction_enhanced`
Optional Parameters
`format`	string	Output format: `markdown` (default), `text`, `json`, `clean_html`, `raw`
`format_options`	array	Format modifiers: `no_links`, `no_images`, `only_content`
`country`	string	ISO country code for proxy (e.g., `us`, `gb`, `de`)
`proxy_pool`	string	Proxy type: `public_datacenter_pool` (default), `public_residential_pool`
`rendering_wait`	integer	Wait time in milliseconds before capturing content
`capture_page`	boolean	Also capture a screenshot of the page
`capture_flags`	array	Screenshot options: `load_images`, `dark_mode`, `block_banners`, `print_media_format`, `high_quality`
`extraction_model`	string	Auto-extract structured data: `article`, `product`, `job_posting`, etc.

Example Usage

{
  "tool": "web_get_page",
  "parameters": {
    "url": "https://news.ycombinator.com",
    "pow": "obtained_from_instruction_tool",
    "format": "markdown",
    "format_options": ["only_content"]
  }
}

web_scrape

Advanced scraping tool with full control over every aspect. JavaScript rendering, custom headers, cookies, POST requests, and sophisticated browser automation scenarios.

Enterprise-grade control:

Browser automation - Multi-step interactions
Authentication - Login flows with cookies

Custom requests - POST/PUT/PATCH
LLM extraction - AI-powered data extraction

Parameter	Type	Description
Required Parameters
`url`	string	Target URL to scrape
`pow`	string	Proof of work value from `scraping_instruction_enhanced`
Optional Parameters
`render_js`	boolean	Enable JavaScript rendering with headless browser (default: true) JavaScript rendering guide
`js_scenario`	array	Browser automation steps: `click`, `fill`, `scroll`, `wait`, `execute`, `condition` Complete JS scenario reference
`asp`	boolean	Enable Anti Scraping Protection (default: true) Learn about ASP
`extraction_prompt`	string	LLM prompt for AI-powered data extraction LLM extraction guide
`extraction_model`	string	Pre-trained extraction model (product, article, etc.) Available models
`format`	string	Output format: `markdown` (default), `text`, `json`, `clean_html`, `raw`
`format_options`	array	Format modifiers: `no_links`, `no_images`, `only_content`
`method`	string	HTTP method: GET (default), POST, PUT, PATCH, OPTIONS
`headers`	object	Custom HTTP headers
`cookies`	array	Cookies to send with request Session management
`body`	string	Request body for POST/PUT/PATCH
`screenshots`	array	Capture multiple screenshots (fullpage or CSS selector) Screenshot API reference
`screenshot_flags`	array	Screenshot options: `load_images`, `dark_mode`, `block_banners`, `print_media_format`, `high_quality`
`cache`	boolean	Enable response caching Caching guide
`cache_ttl`	integer	Cache TTL in seconds when cache is true
`cache_clear`	boolean	If true, bypass & clear cache for this URL
`retry`	boolean	Enable automatic retry on transient errors (default: true)
`timeout`	integer	Server-side timeout in milliseconds
`rendering_wait`	integer	Wait time in milliseconds before returning response Learn about JavaScript rendering
`wait_for_selector`	string	Wait for this CSS selector to appear in the page when rendering JS
`js`	string	JavaScript to execute on the page
`lang`	array	Languages to use for the request (Accept-Language header)
`country`	string	Proxy location (ISO 3166-1 alpha-2 code, e.g., "us", "gb", "de") Geo-targeting options
`proxy_pool`	string	`public_datacenter_pool` (default) or `public_residential_pool` Compare proxy pools

Example: Login Flow

{
  "tool": "web_scrape",
  "parameters": {
    "url": "https://web-scraping.dev/login",
    "pow": "obtained_from_instruction_tool",
    "render_js": true,
    "js_scenario": [
      { "fill": { "selector": "input[name='username']", "value": "myuser" } },
      { "fill": { "selector": "input[name='password']", "value": "mypass" } },
      { "click": { "selector": "button[type='submit']" } },
      { "wait_for_navigation": { "timeout": 5000 } }
    ]
  }
}

Example: LLM Extraction

{
  "tool": "web_scrape",
  "parameters": {
    "url": "https://web-scraping.dev/products",
    "pow": "obtained_from_instruction_tool",
    "extraction_prompt": "Extract all product names, prices, and ratings as a JSON array"
  }
}

screenshot

Capture high-quality screenshots of any webpage. Full page or specific elements using CSS selectors.

Complete Screenshot API documentation

Parameters

Parameter	Type	Description
Required Parameters
`url`	string	Target URL to capture
Optional Parameters
`capture`	string	`fullpage` (default) or CSS selector
`format`	string	Image format: `jpg` (default), `png`, `webp`, `gif`
`resolution`	string	Screen resolution (e.g., "1920x1080", default)
`options`	array	Options: `load_images`, `dark_mode`, `block_banners`, `print_media_format` Options reference
`auto_scroll`	boolean	Automatically scroll to load lazy content
`wait_for_selector`	string	CSS selector to wait for before capturing
`rendering_wait`	integer	Wait time in milliseconds before capturing
`js`	string	JavaScript to execute before capturing
`country`	string	Proxy location (ISO 3166-1 alpha-2 code) Geo-targeting
`cache`	boolean	Enable response caching
`cache_ttl`	integer	Cache time-to-live in seconds
`cache_clear`	boolean	Bypass & clear cache for this request
`webhook`	string	Webhook to call after the request completes

Example Usage

{
  "tool": "screenshot",
  "parameters": {
    "url": "https://web-scraping.dev/pricing",
    "capture": ".pricing-table",
    "format": "png",
    "options": ["load_images", "block_banners"]
  }
}

info_account

Get real-time information about your Scrapfly account, including subscription details, usage statistics, rate limits, and billing information.

Returns:

Account - ID, currency, timezone, status
Project - Name, quota, budget, networks

Subscription - Plan, billing period, concurrency
Usage - Credits, remaining quota, concurrent requests

No Parameters Required: This tool uses your authenticated API key and requires no additional parameters.

Example Usage

{ "tool": "info_account" }

Response Format

All scraping tools (web_get_page and web_scrape) return responses in this format:

{
  "content": "The scraped content in requested format",
  "status_code": 200,
  "content_type": "text/html; charset=utf-8",
  "extraction_result": { /* Only if extraction_model or extraction_prompt was used */ },
  "screenshots": { /* Only if screenshots were captured */ },
  "errors": null  // or error object if request failed
}

Error Handling

When a request fails, the errors field contains detailed information. View complete error reference

{
  "errors": {
    "code": "ERR::ASP::SHIELD_PROTECTION_FAILED",
    "message": "Anti-scraping protection failed after retries",
    "http_code": 422,
    "retryable": true,
    "doc_url": "https://scrapfly.io/docs/scrape-api/errors#asp-shield"
  }
}

Common Error Scenarios

Retryable errors - Automatically retried
Transient failures
Non-retryable errors - Require config changes
Invalid parameters, quota exceeded
Rate limits - Check info_account
Concurrency limits

ASP errors - Anti-scraping protection failures
Proxy errors - Proxy connection issues
Throttle errors - Rate limiting and quota issues

Billing & Cost Optimization

Each scraping request consumes credits based on features used. Complete billing guide

1-3

credits

Base cost

Simple requests

+5

credits

JavaScript rendering

Headless browser

+10-30

credits

ASP

Anti-scraping protection

+25

credits

Residential proxies

High success rate

+5

credits

Screenshots

Each capture

Cost Optimization Tips:

Use web_get_page for simple requests instead of web_scrape
Start with datacenter proxies, escalate to residential only if needed
Disable render_js for static pages
Use caching for frequently accessed pages
Check scraping_instruction_enhanced for optimal configurations

Next Steps

See real-world examples using these tools
Set up authentication for your MCP client
Learn about the underlying Scrape API
Read the FAQ for common questions

MCP Tools & API Specification

Tools Overview

scraping_instruction_enhanced

Provides:

Example Usage

web_get_page

What it does automatically:

Example Usage

web_scrape

Enterprise-grade control:

Example: Login Flow

Example: LLM Extraction

screenshot

Parameters

Example Usage

info_account

Returns:

Example Usage

Response Format

Error Handling

Common Error Scenarios

Billing & Cost Optimization

1-3

+5

+10-30

+25

+5

Next Steps

Summary