# Scrapfly Documentation

## Table of Contents

### Dashboard

- [Intro](https://scrapfly.io/docs)
- [Project](https://scrapfly.io/docs/project)
- [Account](https://scrapfly.io/docs/account)
- [Workspace & Team](https://scrapfly.io/docs/workspace-and-team)
- [Billing](https://scrapfly.io/docs/billing)

### Products

#### MCP Server

- [Getting Started](https://scrapfly.io/docs/mcp/getting-started)
- [Tools & API Spec](https://scrapfly.io/docs/mcp/tools)
- [Authentication](https://scrapfly.io/docs/mcp/authentication)
- [Examples & Use Cases](https://scrapfly.io/docs/mcp/examples)
- [FAQ](https://scrapfly.io/docs/mcp/faq)
##### Integrations

- [Overview](https://scrapfly.io/docs/mcp/integrations)
- [Claude Desktop](https://scrapfly.io/docs/mcp/integrations/claude-desktop)
- [Claude Code](https://scrapfly.io/docs/mcp/integrations/claude-code)
- [ChatGPT](https://scrapfly.io/docs/mcp/integrations/chatgpt)
- [Cursor](https://scrapfly.io/docs/mcp/integrations/cursor)
- [Cline](https://scrapfly.io/docs/mcp/integrations/cline)
- [Windsurf](https://scrapfly.io/docs/mcp/integrations/windsurf)
- [Zed](https://scrapfly.io/docs/mcp/integrations/zed)
- [Roo Code](https://scrapfly.io/docs/mcp/integrations/roo-code)
- [VS Code](https://scrapfly.io/docs/mcp/integrations/vscode)
- [LangChain](https://scrapfly.io/docs/mcp/integrations/langchain)
- [LlamaIndex](https://scrapfly.io/docs/mcp/integrations/llamaindex)
- [CrewAI](https://scrapfly.io/docs/mcp/integrations/crewai)
- [OpenAI](https://scrapfly.io/docs/mcp/integrations/openai)
- [n8n](https://scrapfly.io/docs/mcp/integrations/n8n)
- [Make](https://scrapfly.io/docs/mcp/integrations/make)
- [Zapier](https://scrapfly.io/docs/mcp/integrations/zapier)
- [Vapi AI](https://scrapfly.io/docs/mcp/integrations/vapi)
- [Agent Builder](https://scrapfly.io/docs/mcp/integrations/agent-builder)
- [Custom Client](https://scrapfly.io/docs/mcp/integrations/custom-client)


#### Web Scraping API

- [Getting Started](https://scrapfly.io/docs/scrape-api/getting-started)
- [API Specification](https://scrapfly.io/docs/scrape-api/specification)
- [Monitoring](https://scrapfly.io/docs/monitoring)
- [Customize Request](https://scrapfly.io/docs/scrape-api/custom)
- [Debug](https://scrapfly.io/docs/scrape-api/debug)
- [Anti Scraping Protection](https://scrapfly.io/docs/scrape-api/anti-scraping-protection)
- [Proxy](https://scrapfly.io/docs/scrape-api/proxy)
- [Proxy Mode](https://scrapfly.io/docs/scrape-api/proxy-mode)
- [Proxy Mode - Screaming Frog](https://scrapfly.io/docs/scrape-api/proxy-mode/screaming-frog)
- [Proxy Mode - Apify](https://scrapfly.io/docs/scrape-api/proxy-mode/apify)
- [(Auto) Data Extraction](https://scrapfly.io/docs/scrape-api/extraction)
- [Javascript Rendering](https://scrapfly.io/docs/scrape-api/javascript-rendering)
- [Javascript Scenario](https://scrapfly.io/docs/scrape-api/javascript-scenario)
- [SSL](https://scrapfly.io/docs/scrape-api/ssl)
- [DNS](https://scrapfly.io/docs/scrape-api/dns)
- [Cache](https://scrapfly.io/docs/scrape-api/cache)
- [Session](https://scrapfly.io/docs/scrape-api/session)
- [Webhook](https://scrapfly.io/docs/scrape-api/webhook)
- [Screenshot](https://scrapfly.io/docs/scrape-api/screenshot)
- [Errors](https://scrapfly.io/docs/scrape-api/errors)
- [Timeout](https://scrapfly.io/docs/scrape-api/understand-timeout)
- [Throttling](https://scrapfly.io/docs/throttling)
- [Troubleshoot](https://scrapfly.io/docs/scrape-api/troubleshoot)
- [Billing](https://scrapfly.io/docs/scrape-api/billing)
- [FAQ](https://scrapfly.io/docs/scrape-api/faq)

#### Crawler API

- [Getting Started](https://scrapfly.io/docs/crawler-api/getting-started)
- [API Specification](https://scrapfly.io/docs/crawler-api/specification)
- [Retrieving Results](https://scrapfly.io/docs/crawler-api/results)
- [WARC Format](https://scrapfly.io/docs/crawler-api/warc-format)
- [Data Extraction](https://scrapfly.io/docs/crawler-api/extraction-rules)
- [Webhook](https://scrapfly.io/docs/crawler-api/webhook)
- [Billing](https://scrapfly.io/docs/crawler-api/billing)
- [Errors](https://scrapfly.io/docs/crawler-api/errors)
- [Troubleshoot](https://scrapfly.io/docs/crawler-api/troubleshoot)
- [FAQ](https://scrapfly.io/docs/crawler-api/faq)

#### Screenshot API

- [Getting Started](https://scrapfly.io/docs/screenshot-api/getting-started)
- [API Specification](https://scrapfly.io/docs/screenshot-api/specification)
- [Accessibility Testing](https://scrapfly.io/docs/screenshot-api/accessibility)
- [Webhook](https://scrapfly.io/docs/screenshot-api/webhook)
- [Billing](https://scrapfly.io/docs/screenshot-api/billing)
- [Errors](https://scrapfly.io/docs/screenshot-api/errors)

#### Extraction API

- [Getting Started](https://scrapfly.io/docs/extraction-api/getting-started)
- [API Specification](https://scrapfly.io/docs/extraction-api/specification)
- [Rules Template](https://scrapfly.io/docs/extraction-api/rules-and-template)
- [LLM Extraction](https://scrapfly.io/docs/extraction-api/llm-prompt)
- [AI Auto Extraction](https://scrapfly.io/docs/extraction-api/automatic-ai)
- [Webhook](https://scrapfly.io/docs/extraction-api/webhook)
- [Billing](https://scrapfly.io/docs/extraction-api/billing)
- [Errors](https://scrapfly.io/docs/extraction-api/errors)
- [FAQ](https://scrapfly.io/docs/extraction-api/faq)

#### Proxy Saver

- [Getting Started](https://scrapfly.io/docs/proxy-saver/getting-started)
- [Fingerprints](https://scrapfly.io/docs/proxy-saver/fingerprints)
- [Optimizations](https://scrapfly.io/docs/proxy-saver/optimizations)
- [SSL Certificates](https://scrapfly.io/docs/proxy-saver/certificates)
- [Protocols](https://scrapfly.io/docs/proxy-saver/protocols)
- [Pacfile](https://scrapfly.io/docs/proxy-saver/pacfile)
- [Secure Credentials](https://scrapfly.io/docs/proxy-saver/security)
- [Billing](https://scrapfly.io/docs/proxy-saver/billing)

#### Cloud Browser API

- [Getting Started](https://scrapfly.io/docs/cloud-browser-api/getting-started)
- [Proxy & Geo-Targeting](https://scrapfly.io/docs/cloud-browser-api/proxy)
- [Unblock API](https://scrapfly.io/docs/cloud-browser-api/unblock)
- [File Downloads](https://scrapfly.io/docs/cloud-browser-api/file-downloads)
- [Session Resume](https://scrapfly.io/docs/cloud-browser-api/session-resume)
- [Human-in-the-Loop](https://scrapfly.io/docs/cloud-browser-api/human-in-the-loop)
- [Debug Mode](https://scrapfly.io/docs/cloud-browser-api/debug-mode)
- [Bring Your Own Proxy](https://scrapfly.io/docs/cloud-browser-api/bring-your-own-proxy)
- [Browser Extensions](https://scrapfly.io/docs/cloud-browser-api/extensions)
##### Integrations

- [Puppeteer](https://scrapfly.io/docs/cloud-browser-api/puppeteer)
- [Playwright](https://scrapfly.io/docs/cloud-browser-api/playwright)
- [Selenium](https://scrapfly.io/docs/cloud-browser-api/selenium)
- [Vercel Agent Browser](https://scrapfly.io/docs/cloud-browser-api/agent-browser)
- [Browser Use](https://scrapfly.io/docs/cloud-browser-api/browser-use)
- [Stagehand](https://scrapfly.io/docs/cloud-browser-api/stagehand)
- [Vibium](https://scrapfly.io/docs/cloud-browser-api/vibium)

- [Billing](https://scrapfly.io/docs/cloud-browser-api/billing)
- [Errors](https://scrapfly.io/docs/cloud-browser-api/errors)


### Tools

- [Antibot Detector](https://scrapfly.io/docs/tools/antibot-detector)

### SDK

- [Golang](https://scrapfly.io/docs/sdk/golang)
- [Python](https://scrapfly.io/docs/sdk/python)
- [TypeScript](https://scrapfly.io/docs/sdk/typescript)
- [Scrapy](https://scrapfly.io/docs/sdk/scrapy)

### Integrations

- [Getting Started](https://scrapfly.io/docs/integration/getting-started)
- [LangChain](https://scrapfly.io/docs/integration/langchain)
- [LlamaIndex](https://scrapfly.io/docs/integration/llamaindex)
- [CrewAI](https://scrapfly.io/docs/integration/crewai)
- [Zapier](https://scrapfly.io/docs/integration/zapier)
- [Make](https://scrapfly.io/docs/integration/make)
- [n8n](https://scrapfly.io/docs/integration/n8n)

### Academy

- [Overview](https://scrapfly.io/academy)
- [Web Scraping Overview](https://scrapfly.io/academy/scraping-overview)
- [Tools](https://scrapfly.io/academy/tools-overview)
- [Reverse Engineering](https://scrapfly.io/academy/reverse-engineering)
- [Static Scraping](https://scrapfly.io/academy/static-scraping)
- [HTML Parsing](https://scrapfly.io/academy/html-parsing)
- [Dynamic Scraping](https://scrapfly.io/academy/dynamic-scraping)
- [Hidden API Scraping](https://scrapfly.io/academy/hidden-api-scraping)
- [Headless Browsers](https://scrapfly.io/academy/headless-browsers)
- [Hidden Web Data](https://scrapfly.io/academy/hidden-web-data)
- [JSON Parsing](https://scrapfly.io/academy/json-parsing)
- [Data Processing](https://scrapfly.io/academy/data-processing)
- [Scaling](https://scrapfly.io/academy/scaling)
- [Walkthrough Summary](https://scrapfly.io/academy/walkthrough-summary)
- [Scraper Blocking](https://scrapfly.io/academy/scraper-blocking)
- [Proxies](https://scrapfly.io/academy/proxies)

---

# LlamaIndex

 [  View as markdown ](https://scrapfly.io/?view=markdown)   Copy for LLM    Copy for LLM  [     Open in ChatGPT ](https://chatgpt.com/?hints=search&prompt=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fmcp%2Fintegrations%2Fllamaindex%20so%20I%20can%20ask%20questions%20about%20it.) [     Open in Claude ](https://claude.ai/new?q=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fmcp%2Fintegrations%2Fllamaindex%20so%20I%20can%20ask%20questions%20about%20it.) [     Open in Perplexity ](https://www.perplexity.ai/search/new?q=Read%20from%20https%3A%2F%2Fscrapfly.io%2Fdocs%2Fmcp%2Fintegrations%2Fllamaindex%20so%20I%20can%20ask%20questions%20about%20it.) 

 

 

 Data framework for LLM applications. Connect Scrapfly to LlamaIndex agents and workflows for intelligent web data ingestion and RAG applications.

 

 

 AI Framework Python TypeScript [  Official Website ](https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/mcp/) 

 

 1. [MCP Documentation](https://scrapfly.io/docs/mcp)
2. [Integrations](https://scrapfly.io/docs/mcp/integrations)
3. LlamaIndex
 
 ## Prerequisites

Before getting started, make sure you have the following:

- Python 3.8+ installed
- `llama-index-tools-mcp` package installed
- Your Scrapfly API key
 
## Setup Instructions

LlamaIndex supports MCP servers through the tools integration. Follow these steps to connect Scrapfly for web data ingestion.

1. **Install Required Packages** Install LlamaIndex MCP integration tools:
    
     ```
    pip install llama-index-tools-mcp
    ```
    
     
    
       
    
     
    
      Tip: Development EnvironmentUse a virtual environment for Python projects:
    
     ```
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install llama-index-tools-mcp
    ```
2. **Initialize Scrapfly MCP Tools** Connect to the Scrapfly MCP server and load tools into LlamaIndex:
    
    ### Python Example with API Key
    
     ```
    import asyncio
    
    from llama_index.tools.mcp import (
        BasicMCPClient,
        get_tools_from_mcp_url,
        aget_tools_from_mcp_url,
    )
    
    mcp_client = BasicMCPClient("https://mcp.scrapfly.io/mcp?key=YOUR_API_KEY")
    
    async def main():
        tools = await aget_tools_from_mcp_url(
            "https://mcp.scrapfly.io/mcp?key=YOUR_API_KEY",
            client=mcp_client
        )
    
        for tool in tools:
            print(f"  - {tool.metadata.name}")
    
    asyncio.run(main())
    ```
3. **Create a LlamaIndex Agent with Scrapfly** Build an agent that can scrape web data and integrate it into your LlamaIndex workflow:
    
     ```
    from llama_index.core.agent import ReActAgent
    from llama_index.llms.anthropic import Anthropic
    from llama_index.tools.mcp import MCPToolProvider
    
    # Initialize Scrapfly MCP tools
    scrapfly_provider = MCPToolProvider(
        server_name="scrapfly",
        command="npx",
        args=["mcp-remote", "https://mcp.scrapfly.io/mcp"]
    )
    
    scrapfly_tools = scrapfly_provider.get_tools()
    
    # Initialize LLM
    llm = Anthropic(model="claude-3-5-sonnet-20241022")
    
    # Create agent with Scrapfly tools
    agent = ReActAgent.from_tools(
        tools=scrapfly_tools,
        llm=llm,
        verbose=True
    )
    
    # Use the agent to scrape and process data
    response = agent.chat(
        "Scrape the top posts from Hacker News and create a summary"
    )
    
    print(response)
    ```
4. **Build a RAG Pipeline with Web Data** Use Scrapfly to ingest live web content into a LlamaIndex RAG application:
    
     ```
    import asyncio
    
    from llama_index.core import VectorStoreIndex, Document
    from llama_index.core.agent import ReActAgent
    from llama_index.llms.anthropic import Anthropic
    from llama_index.tools.mcp import (
        BasicMCPClient,
        get_tools_from_mcp_url,
        aget_tools_from_mcp_url,
    )
    
    async def main():
        # Initialize Scrapfly MCP
        scrapfly_provider = BasicMCPClient("https://mcp.scrapfly.io/mcp?key=YOUR_API_KEY")
    
        scrapfly_tools = await aget_tools_from_mcp_url(
                "https://mcp.scrapfly.io/mcp?key=YOUR_API_KEY",
                client=scrapfly_provider
            )
    
        llm = Anthropic(model="claude-3-5-sonnet-20241022")
    
        # Create agent that can scrape web data
        agent = ReActAgent.from_tools(
            tools=scrapfly_tools,
            llm=llm,
            verbose=True
        )
    
        # Scrape web content
        response = agent.chat(
            "Scrape the documentation from https://scrapfly.io/docs and return the content"
        )
    
        # Convert scraped content to documents
        documents = [
            Document(text=response.response)
        ]
    
        # Build vector index from scraped data
        index = VectorStoreIndex.from_documents(documents)
    
        # Query the index
        query_engine = index.as_query_engine()
        result = query_engine.query("What is web scraping?")
        print(result)
    
    
    asyncio.run(main())
    ```

## Example Prompts

###### RAG with Live Web Data

    

Scrape documentation from https://web-scraping.dev and answer questions about it

 

    

###### Knowledge Base Construction

    

Scrape blog posts from multiple sources and build a searchable knowledge base

 

    

###### Research Agent with Web Access

    

Research the latest AI trends by scraping tech news sites and summarize findings

 

    

###### Document Ingestion Pipeline

    

Scrape product documentation from competitor sites and compare features

 

    



## Troubleshooting

#####    Import Error: llama-index-tools-mcp not found   

 

**Problem:** `ModuleNotFoundError: No module named 'llama_index.tools.mcp'`

**Solution:**

- Install the package: `pip install llama-index-tools-mcp`
- Verify Python environment: `which python`
- Try upgrading: `pip install --upgrade llama-index-tools-mcp`
- Check LlamaIndex version is 0.10.0+: `pip show llama-index`
 
 

 

 

#####    npx Command Not Found   

 

**Problem:** MCPToolProvider cannot execute `npx` command

**Solution:**

- Ensure Node.js 18+ is installed: `node --version`
- Verify `npx` is in PATH: `npx --version`
- Restart terminal after installing Node.js
- Try specifying full path: `command="/usr/local/bin/npx"`
 
 

 

 

#####    OAuth2 in Production/CI Environments   

 

**Problem:** OAuth2 cannot open browser in headless environment

**Solution:**

- Use API key authentication for production deployments
- Store API key in environment variable: `SCRAPFLY_API_KEY`
- Load from environment: `args=["mcp-remote", f"https://mcp.scrapfly.io/mcp?key={os.getenv('SCRAPFLY_API_KEY')}"]`
 
 

 

 

#####    Agent Not Using Scrapfly Tools   

 

**Problem:** Agent does not call Scrapfly tools when asked to scrape

**Solution:**

- Verify tools loaded: `print([tool.metadata.name for tool in scrapfly_tools])`
- Check LLM supports function calling (Claude 3+, GPT-4+)
- Use explicit prompts mentioning "scrape" or "web data"
- Enable verbose mode: `verbose=True` in agent creation
 
 

 

 

#####    Document Ingestion Errors   

 

**Problem:** Scraped content cannot be converted to Document objects

**Solution:**

- Ensure scraped content is text/markdown format
- Check response format from Scrapfly MCP tools
- Parse response before creating Document: `Document(text=str(response))`
- Handle empty or malformed responses with error checking
 
 

 

 



## Next Steps

- [Explore available MCP tools](https://scrapfly.io/docs/mcp/tools) and their capabilities
- [See real-world examples](https://scrapfly.io/docs/mcp/examples) of what you can build
- [Learn about authentication methods](https://scrapfly.io/docs/mcp/authentication) in detail
- [Read the FAQ](https://scrapfly.io/docs/mcp/faq) for common questions
 
 [  Back to All Integrations ](https://scrapfly.io/docs/mcp/integrations)