LangChain

LangChain logo

Popular AI framework for building LLM applications. Integrate Scrapfly web scraping into your LangChain agents and chains for intelligent data collection workflows.

AI Framework Python JavaScript TypeScript Official Website

Prerequisites

Before getting started, make sure you have the following:

  • Python 3.8+ or Node.js 18+ installed
  • langchain and langchain-mcp packages installed
  • Your Scrapfly API key (only if not using OAuth2)

Setup Instructions

LangChain supports MCP servers through the langchain-mcp integration. Follow these steps to connect Scrapfly.

  1. Install Required Packages

    Install LangChain and the MCP integration package:

    Python:

    JavaScript/TypeScript:

    Tip: Virtual Environments

    For Python, use a virtual environment to avoid package conflicts:

  2. Initialize MCP Client in Your Code

    Connect to the Scrapfly MCP server from your LangChain application:

    Python Example with OAuth2 (Recommended)

    Why OAuth2?
    • No API keys in your code repository
    • Automatic token rotation for enhanced security
    • Instant revocation if needed
    • Full audit trail of authentication events

    Team collaboration: See project-scoped setup to share configuration with your team via version control.

    Python Example with API Key

    Important: Replace YOUR_API_KEY with your actual Scrapfly API key. Sign up for free to get your API key.

    JavaScript/TypeScript Example

  3. Build a LangChain Agent with Scrapfly Tools

    Create an AI agent that can use Scrapfly for web scraping:

    Pro Tip: The agent will automatically call scraping_instruction_enhanced to get required parameters before scraping!
  4. Test Your Integration

    Run your LangChain application and verify Scrapfly MCP is working:

    Tip: Enable Verbose Mode

    Set verbose=True in AgentExecutor to see detailed logs of tool calls and agent reasoning.

Example Prompts

Research Assistant Agent
Research the top AI news from today and create a summary report with sources
Competitive Analysis Chain
Scrape pricing data from competitor websites and create a comparison table
Data Pipeline with LLM Processing
Scrape product reviews, analyze sentiment, and generate insights
Multi-Step Research Workflow
Find the top 3 blog posts about web scraping, scrape their content, and summarize key takeaways

Troubleshooting

Problem: ModuleNotFoundError: No module named 'langchain_mcp'

Solution:

  • Ensure you installed the package: pip install langchain-mcp
  • Verify you're using the correct Python environment (check with which python)
  • Try upgrading: pip install --upgrade langchain-mcp

Problem: MCPClient cannot execute npx command

Solution:

  • Ensure Node.js 18+ is installed: node --version
  • Verify npx is in PATH: npx --version
  • On Windows, restart terminal after installing Node.js
  • Try specifying full path: command="/usr/local/bin/npx"

Problem: OAuth2 link cannot open in server/CI environment

Solution:

  • Use API key authentication instead of OAuth2
  • Set API key as environment variable: SCRAPFLY_API_KEY
  • Load from environment: args=["mcp-remote", f"https://mcp.scrapfly.io/mcp?key={os.getenv('SCRAPFLY_API_KEY')}"]

Problem: Agent does not use Scrapfly tools even when prompted

Solution:

  • Verify tools are loaded: print([tool.name for tool in scrapfly_tools])
  • Check LLM supports tool calling (Claude 3+, GPT-4+)
  • Use more explicit prompts mentioning "scrape" or "web data"
  • Enable verbose mode to see agent's reasoning

Problem: Connection to Scrapfly MCP server times out

Solution:

  • Check internet connection
  • Verify https://mcp.scrapfly.io/mcp is accessible
  • If behind a proxy, configure proxy settings
  • Increase timeout in MCPClient initialization if needed

Next Steps

Summary