OpenAI Assistants API

OpenAI Assistants API logo

OpenAI's Assistants API with function calling. Integrate Scrapfly web scraping as custom functions for GPT-4 assistants with autonomous data collection capabilities.

AI Platform Python JavaScript TypeScript REST API Official Website

Prerequisites

Before getting started, make sure you have the following:

  • OpenAI API key (get one here)
  • Your Scrapfly API key
  • Python 3.8+, Node.js 18+, or any HTTP client
Note: OpenAI Assistants API does not natively support MCP servers. Instead, we integrate Scrapfly through function calling, where your code bridges between the assistant and Scrapfly's API.

Setup Instructions

Integrate Scrapfly with OpenAI Assistants by creating function definitions that call Scrapfly's API. This takes about 10 minutes to set up.

  1. Install OpenAI SDK

    Install the OpenAI SDK for your platform:

    Python:

    JavaScript/TypeScript:

    Tip: Environment Variables

    Store your API keys in environment variables:

  2. Define Scrapfly Function Schemas

    Create function definitions that describe Scrapfly's capabilities to the OpenAI assistant:

    Python Example

    JavaScript/TypeScript Example

  3. Implement Function Handler

    Create a function that executes Scrapfly API calls when the assistant requests web scraping:

    Python Handler

    JavaScript/TypeScript Handler

  4. Create OpenAI Assistant with Scrapfly Function

    Create an assistant that can call your Scrapfly function:

    Python Example

    Important: Replace API keys with your actual keys. Sign up for free to get your Scrapfly API key.

Example Prompts

Research Assistant
Scrape the documentation from https://web-scraping.dev and explain how their API works
Price Monitoring Bot
Check the current price of the product at https://web-scraping.dev/product
News Aggregation
Scrape the latest tech news from Hacker News and TechCrunch, then summarize
Competitive Analysis
Compare features listed on these competitor websites: [url1, url2, url3]

Troubleshooting

Problem: Assistant does not call the scrape_webpage function

Solution:

  • Ensure function schema is properly defined in tools parameter
  • Use explicit prompts mentioning "scrape" or "fetch from URL"
  • Check assistant instructions guide it to use the function
  • Verify you're using a model that supports function calling (GPT-4, GPT-3.5-turbo)

Problem: Function returns error messages from Scrapfly API

Solution:

  • Verify Scrapfly API key is correct and active
  • Check URL is properly formatted (must start with http:// or https://)
  • Ensure you have sufficient Scrapfly credits
  • Review Scrapfly API response for specific error details

Problem: Run never completes, stuck waiting for tool outputs

Solution:

  • Ensure you're calling submit_tool_outputs for all tool calls
  • Check that tool_call_id matches the requested tool call
  • Verify function output is a string (not None or empty)
  • Add error handling in function to always return a result

Problem: Hitting rate limits on OpenAI or Scrapfly APIs

Solution:

  • Add exponential backoff retry logic in function handler
  • Check OpenAI API rate limits for your tier
  • Monitor Scrapfly usage and upgrade plan if needed
  • Implement caching for frequently scraped URLs

Problem: Scraped content too large for function output

Solution:

  • Truncate large responses before returning
  • Use format="text" instead of "markdown" for smaller output
  • Implement pagination or chunking for large pages
  • Summarize content in the function before returning

Next Steps

Summary