OpenAI's Assistants API with function calling. Integrate Scrapfly web scraping as custom functions for GPT-4 assistants with autonomous data collection capabilities.
Note: OpenAI Assistants API does not natively support MCP servers. Instead, we integrate Scrapfly through
function calling,
where your code bridges between the assistant and Scrapfly's API.
Setup Instructions
Integrate Scrapfly with OpenAI Assistants by creating function definitions that call Scrapfly's API. This takes about 10 minutes to set up.
Create an assistant that can call your Scrapfly function:
Python Example
# Create assistant with Scrapfly function
assistant = client.beta.assistants.create(
name="Web Research Assistant",
instructions="You are a helpful assistant that can scrape and analyze web content. Use the scrape_webpage function to fetch data from URLs.",
model="gpt-4-turbo",
tools=[scrapfly_function]
)
# Create a thread and run
thread = client.beta.threads.create()
# User message
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Scrape the top posts from https://news.ycombinator.com and summarize them"
)
# Run the assistant
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id
)
# Handle function calling
import time
while run.status != "completed":
run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
if run.status == "requires_action":
tool_calls = run.required_action.submit_tool_outputs.tool_calls
tool_outputs = []
for tool_call in tool_calls:
if tool_call.function.name == "scrape_webpage":
import json
args = json.loads(tool_call.function.arguments)
result = scrape_webpage(**args)
tool_outputs.append({
"tool_call_id": tool_call.id,
"output": result
})
# Submit function outputs
client.beta.threads.runs.submit_tool_outputs(
thread_id=thread.id,
run_id=run.id,
tool_outputs=tool_outputs
)
time.sleep(1)
# Get final response
messages = client.beta.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)
Important: Replace API keys with your actual keys.
Sign up for free to get your Scrapfly API key.
Example Prompts
Research Assistant
Scrape the documentation from https://web-scraping.dev and explain how their API works
Price Monitoring Bot
Check the current price of the product at https://web-scraping.dev/product
News Aggregation
Scrape the latest tech news from Hacker News and TechCrunch, then summarize
Competitive Analysis
Compare features listed on these competitor websites: [url1, url2, url3]
Troubleshooting
Problem: Assistant does not call the scrape_webpage function
Solution:
Ensure function schema is properly defined in tools parameter
Use explicit prompts mentioning "scrape" or "fetch from URL"
Check assistant instructions guide it to use the function
Verify you're using a model that supports function calling (GPT-4, GPT-3.5-turbo)
Problem: Function returns error messages from Scrapfly API
Solution:
Verify Scrapfly API key is correct and active
Check URL is properly formatted (must start with http:// or https://)
Ensure you have sufficient Scrapfly credits
Review Scrapfly API response for specific error details
Problem: Run never completes, stuck waiting for tool outputs
Solution:
Ensure you're calling submit_tool_outputs for all tool calls
Check that tool_call_id matches the requested tool call
Verify function output is a string (not None or empty)
Add error handling in function to always return a result
Problem: Hitting rate limits on OpenAI or Scrapfly APIs
Solution:
Add exponential backoff retry logic in function handler
Check OpenAI API rate limits for your tier
Monitor Scrapfly usage and upgrade plan if needed
Implement caching for frequently scraped URLs
Problem: Scraped content too large for function output
Solution:
Truncate large responses before returning
Use format="text" instead of "markdown" for smaller output
Implement pagination or chunking for large pages
Summarize content in the function before returning