OpenAI Assistants API

OpenAI's Assistants API with function calling. Integrate Scrapfly web scraping as custom functions for GPT-4 assistants with autonomous data collection capabilities.

AI Platform Python JavaScript TypeScript REST API Official Website

Prerequisites

Before getting started, make sure you have the following:

OpenAI API key (get one here)
Your Scrapfly API key
Python 3.8+, Node.js 18+, or any HTTP client

Note: OpenAI Assistants API does not natively support MCP servers. Instead, we integrate Scrapfly through function calling, where your code bridges between the assistant and Scrapfly's API.

Setup Instructions

Integrate Scrapfly with OpenAI Assistants by creating function definitions that call Scrapfly's API. This takes about 10 minutes to set up.

Install OpenAI SDK
Install the OpenAI SDK for your platform:

Python:
```
pip install openai requests
```
JavaScript/TypeScript:
```
npm install openai axios
```
Tip: Environment Variables

Store your API keys in environment variables:
```
export OPENAI_API_KEY="your-openai-key"
export SCRAPFLY_API_KEY="your-scrapfly-key"
```

Define Scrapfly Function Schemas

Create function definitions that describe Scrapfly's capabilities to the OpenAI assistant:

Python Example

import openai
import requests
import os

# Initialize OpenAI client
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Define Scrapfly function schema
scrapfly_function = {
    "type": "function",
    "function": {
        "name": "scrape_webpage",
        "description": "Scrape content from a webpage using Scrapfly. Returns the page content in markdown format.",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The URL to scrape (must start with http:// or https://)"
                },
                "render_js": {
                    "type": "boolean",
                    "description": "Whether to render JavaScript on the page (default: true)"
                },
                "format": {
                    "type": "string",
                    "enum": ["markdown", "text", "clean_html", "raw"],
                    "description": "Output format (default: markdown)"
                }
            },
            "required": ["url"]
        }
    }
}

JavaScript/TypeScript Example

import OpenAI from "openai";
import axios from "axios";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

const scrapflyFunction = {
  type: "function" as const,
  function: {
    name: "scrape_webpage",
    description: "Scrape content from a webpage using Scrapfly. Returns the page content in markdown format.",
    parameters: {
      type: "object",
      properties: {
        url: {
          type: "string",
          description: "The URL to scrape (must start with http:// or https://)"
        },
        render_js: {
          type: "boolean",
          description: "Whether to render JavaScript on the page (default: true)"
        },
        format: {
          type: "string",
          enum: ["markdown", "text", "clean_html", "raw"],
          description: "Output format (default: markdown)"
        }
      },
      required: ["url"]
    }
  }
};

Implement Function Handler

Create a function that executes Scrapfly API calls when the assistant requests web scraping:

Python Handler

def scrape_webpage(url: str, render_js: bool = True, format: str = "markdown") -> str:
    """Execute Scrapfly API call"""

    scrapfly_url = "{{ public_api_endpoint }}/scrape"
    params = {
        "key": os.getenv("SCRAPFLY_API_KEY"),
        "url": url,
        "render_js": str(render_js).lower(),
        "format": format
    }

    response = requests.get(scrapfly_url, params=params)

    if response.status_code == 200:
        data = response.json()
        return data.get("result", {}).get("content", "No content found")
    else:
        return f"Error scraping {url}: {response.status_code}"

JavaScript/TypeScript Handler

async function scrapeWebpage(
  url: string,
  renderJs: boolean = true,
  format: string = "markdown"
): Promise {
  const response = await axios.get("{{ public_api_endpoint }}/scrape", {
    params: {
      key: process.env.SCRAPFLY_API_KEY,
      url: url,
      render_js: renderJs.toString(),
      format: format
    }
  });

  if (response.status === 200) {
    return response.data.result?.content || "No content found";
  } else {
    return `Error scraping ${url}: ${response.status}`;
  }
}

Create OpenAI Assistant with Scrapfly Function

Create an assistant that can call your Scrapfly function:

Python Example

# Create assistant with Scrapfly function
assistant = client.beta.assistants.create(
    name="Web Research Assistant",
    instructions="You are a helpful assistant that can scrape and analyze web content. Use the scrape_webpage function to fetch data from URLs.",
    model="gpt-4-turbo",
    tools=[scrapfly_function]
)

# Create a thread and run
thread = client.beta.threads.create()

# User message
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Scrape the top posts from https://news.ycombinator.com and summarize them"
)

# Run the assistant
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Handle function calling
import time
while run.status != "completed":
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

    if run.status == "requires_action":
        tool_calls = run.required_action.submit_tool_outputs.tool_calls
        tool_outputs = []

        for tool_call in tool_calls:
            if tool_call.function.name == "scrape_webpage":
                import json
                args = json.loads(tool_call.function.arguments)
                result = scrape_webpage(**args)
                tool_outputs.append({
                    "tool_call_id": tool_call.id,
                    "output": result
                })

        # Submit function outputs
        client.beta.threads.runs.submit_tool_outputs(
            thread_id=thread.id,
            run_id=run.id,
            tool_outputs=tool_outputs
        )

    time.sleep(1)

# Get final response
messages = client.beta.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)

Important: Replace API keys with your actual keys. Sign up for free to get your Scrapfly API key.

Example Prompts

Research Assistant

Scrape the documentation from https://web-scraping.dev and explain how their API works

Price Monitoring Bot

Check the current price of the product at https://web-scraping.dev/product

News Aggregation

Scrape the latest tech news from Hacker News and TechCrunch, then summarize

Competitive Analysis

Compare features listed on these competitor websites: [url1, url2, url3]

Troubleshooting

Problem: Assistant does not call the scrape_webpage function

Solution:

Ensure function schema is properly defined in tools parameter
Use explicit prompts mentioning "scrape" or "fetch from URL"
Check assistant instructions guide it to use the function
Verify you're using a model that supports function calling (GPT-4, GPT-3.5-turbo)

Problem: Function returns error messages from Scrapfly API

Solution:

Verify Scrapfly API key is correct and active
Check URL is properly formatted (must start with http:// or https://)
Ensure you have sufficient Scrapfly credits
Review Scrapfly API response for specific error details

Problem: Run never completes, stuck waiting for tool outputs

Solution:

Ensure you're calling submit_tool_outputs for all tool calls
Check that tool_call_id matches the requested tool call
Verify function output is a string (not None or empty)
Add error handling in function to always return a result

Problem: Hitting rate limits on OpenAI or Scrapfly APIs

Solution:

Add exponential backoff retry logic in function handler
Check OpenAI API rate limits for your tier
Monitor Scrapfly usage and upgrade plan if needed
Implement caching for frequently scraped URLs

Problem: Scraped content too large for function output

Solution:

Truncate large responses before returning
Use format="text" instead of "markdown" for smaller output
Implement pagination or chunking for large pages
Summarize content in the function before returning

Next Steps

Explore available MCP tools and their capabilities
See real-world examples of what you can build
Learn about authentication methods in detail
Read the FAQ for common questions

Back to All Integrations

OpenAI Assistants API

Prerequisites

Setup Instructions

Python Example

JavaScript/TypeScript Example

Python Handler

JavaScript/TypeScript Handler

Python Example

Example Prompts

Research Assistant

Price Monitoring Bot

News Aggregation

Competitive Analysis

Troubleshooting

Function Not Being Called

Scrapfly API Errors

Run Status Stuck in requires_action

Rate Limiting Issues

Large Content Responses

Next Steps

Summary