Apify to Scrapfly Migration Guide
Migrate from Apify's Actor marketplace to Scrapfly's unified API. One SDK for all sites, predictable credit-based pricing, no compute unit surprises. Most teams complete migration in a few hours.
Understanding the Difference
Apify and Scrapfly have fundamentally different approaches. Understanding this shift makes migration easier.
Apify: Marketplace Model
- 10,000+ pre-built Actors (scrapers)
- Each Actor has its own API and parameters
- Different maintainers, update schedules
- Pricing: subscription + compute units + Actor fees
- Learning curve for each new Actor
Example: Amazon scraping uses "Amazon Product Scraper" Actor with its own input schema
Scrapfly: Unified API Model
- One API handles any website
- Same parameters for all targets
- Full stack ownership, consistent updates
- Pricing: simple credits per request
- Learn once, use everywhere
Example: Amazon scraping uses the same Scrapfly API as any other site
You're not migrating Actor-to-Actor. You're migrating from a marketplace of specialized tools to a single, flexible API. Your existing parsing logic (CSS/XPath selectors, data transformations) will remain largely the same.
Common Actor Migrations
Here's how to replace popular Apify Actors with Scrapfly's unified API.
apify/web-scraper → Scrapfly Scrape API
The generic Web Scraper Actor maps directly to Scrapfly's Scrape API:
Apify Web Scraper
from apify_client import ApifyClient
client = ApifyClient("apify_api_YOUR_TOKEN")
run_input = {
"startUrls": [{"url": "https://example.com"}],
"pageFunction": """
async function pageFunction(context) {
const { request, page } = context;
const title = await page.$eval('h1', el => el.textContent);
return { title, url: request.url };
}
""",
"proxyConfiguration": {"useApifyProxy": True}
}
run = client.actor("apify/web-scraper").call(run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
Scrapfly
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
result = client.scrape(ScrapeConfig(
url="https://example.com",
render_js=True,
asp=True, # Anti-bot bypass
country="us"
))
# Parse with your preferred method
from parsel import Selector
sel = Selector(result.content)
title = sel.css("h1::text").get()
print({"title": title, "url": result.context["url"]})
Google Maps Scrapers → Scrapfly + Extraction API
Replace Google Maps Actors with Scrapfly's API and auto-extraction:
Apify Google Maps Actor
# Apify: Using a Google Maps Actor
run_input = {
"searchStringsArray": ["restaurants in NYC"],
"maxCrawledPlaces": 100,
"language": "en",
"proxyConfig": {"useApifyProxy": True}
}
run = client.actor("compass/crawler-google-places").call(run_input)
# Results come pre-structured from Actor
Scrapfly
# Scrapfly: Scrape + Auto-extraction
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
result = client.scrape(ScrapeConfig(
url="https://www.google.com/maps/search/restaurants+in+NYC",
render_js=True,
asp=True,
rendering_wait=3000, # Wait for results to load
country="us",
# Use AI extraction for structured data
extraction_prompt="Extract business names, ratings, addresses, and phone numbers"
))
E-commerce Actors → Scrapfly + Product Extraction
Replace Amazon, Walmart, eBay Actors with unified scraping + auto-extraction:
Apify Amazon Actor
# Different Actor for each e-commerce site
run_input = {
"productUrls": [
{"url": "https://www.amazon.com/dp/B08N5WRWNW"}
],
"maxRequestsPerCrawl": 100
}
run = client.actor("junglee/amazon-crawler").call(run_input)
# Pre-structured product data from Actor
Scrapfly
# Same API for Amazon, Walmart, eBay, etc.
result = client.scrape(ScrapeConfig(
url="https://www.amazon.com/dp/B08N5WRWNW",
render_js=True,
asp=True,
country="us",
# Auto-detect and extract product data
extraction_model="product"
))
# Structured product data
print(result.extraction_result)
Social Media Actors → Scrapfly Unified API
Replace Instagram, Twitter, TikTok Actors with consistent API calls:
Apify Instagram Actor
# Actor-specific input schema
run_input = {
"directUrls": ["https://www.instagram.com/p/ABC123/"],
"resultsType": "posts",
"resultsLimit": 100,
"proxy": {"useApifyProxy": True}
}
run = client.actor("apify/instagram-scraper").call(run_input)
Scrapfly
# Same approach for any social platform
result = client.scrape(ScrapeConfig(
url="https://www.instagram.com/p/ABC123/",
render_js=True,
asp=True,
rendering_wait=2000,
# Extract post data with AI
extraction_model="social_media_post"
))
# Or use custom prompt extraction
result = client.scrape(ScrapeConfig(
url="https://www.instagram.com/p/ABC123/",
render_js=True,
asp=True,
extraction_prompt="Extract post author, caption, likes count, comments count, and image URLs"
))
Common Parameter Mapping
While Apify Actors have different parameters, here are common concepts and their Scrapfly equivalents.
| Apify Concept / Parameter | Scrapfly Equivalent | Notes |
|---|---|---|
apify_api_token |
key |
API authentication |
startUrls / directUrls |
url |
Single URL per request (batch via loop or async) |
proxyConfiguration.useApifyProxy |
asp=true |
Enable anti-bot bypass with residential proxies |
proxyConfiguration.apifyProxyGroups |
proxy_pool |
public_datacenter_pool or public_residential_pool |
proxyConfiguration.apifyProxyCountry |
country |
2-letter ISO country code |
| Puppeteer/Playwright page functions | js_scenario |
Browser automation actions (click, fill, scroll, wait) |
maxRequestsPerCrawl |
Application logic | Control in your code or use Crawler API limits |
| Datasets (structured output) | Extraction API | Auto-extraction models or LLM prompts for structured data |
| Request queues | Crawler API | Managed crawling with automatic URL discovery |
| Actor storage (key-value, datasets) | Your own storage | Scrapfly returns data directly; store in your preferred DB |
| Webhooks (Actor run completion) | Webhooks | Async notification when scrape completes |
maxConcurrency |
Account limit + async client | Manage concurrency in your application code |
Complete Code Examples
Side-by-side examples showing the shift from Apify Actor orchestration to Scrapfly's unified API.
Basic Scraping Comparison
Apify
from apify_client import ApifyClient
client = ApifyClient("apify_api_YOUR_TOKEN")
# Run an Actor and wait for completion
run = client.actor("apify/web-scraper").call({
"startUrls": [{"url": "https://example.com"}],
"pageFunction": """
async function pageFunction(context) {
return { html: context.body };
}
"""
})
# Fetch results from dataset
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["html"])
Scrapfly
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
# Direct scrape - no Actor management
result = client.scrape(ScrapeConfig(
url="https://example.com",
render_js=True,
asp=True
))
print(result.content)
Batch Scraping with Async
# Apify batch via Actor run
run = client.actor("apify/web-scraper").call({
"startUrls": [
{"url": "https://example.com/page1"},
{"url": "https://example.com/page2"},
{"url": "https://example.com/page3"}
],
"pageFunction": "..."
})
# Wait for Actor to finish, then fetch dataset
# Scrapfly async batch scraping
import asyncio
from scrapfly import ScrapflyClient, ScrapeConfig
async def scrape_batch():
async with ScrapflyClient(key="YOUR_KEY") as client:
urls = [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
]
configs = [ScrapeConfig(url=url, render_js=True, asp=True) for url in urls]
results = await asyncio.gather(*[client.async_scrape(c) for c in configs])
return results
results = asyncio.run(scrape_batch())
Apify
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: 'apify_api_YOUR_TOKEN'
});
const run = await client.actor('apify/web-scraper').call({
startUrls: [{ url: 'https://example.com' }],
pageFunction: `
async function pageFunction(context) {
return { html: context.body };
}
`
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);
Scrapfly
import { ScrapflyClient } from 'scrapfly-sdk';
const client = new ScrapflyClient({
key: 'YOUR_SCRAPFLY_KEY'
});
const result = await client.scrape({
url: 'https://example.com',
render_js: true,
asp: true
});
console.log(result.result.content);
Apify (start Actor run)
# Start Actor run (async)
curl -X POST \
"https://api.apify.com/v2/acts/apify~web-scraper/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"startUrls": [{"url": "https://example.com"}],
"pageFunction": "async function pageFunction(context) { return { html: context.body }; }"
}'
# Then poll for completion and fetch dataset...
Scrapfly (synchronous)
# Direct synchronous scrape
curl "https://api.scrapfly.io/scrape\
?key=YOUR_SCRAPFLY_KEY\
&url=https%3A%2F%2Fexample.com\
&render_js=true\
&asp=true"
# Response contains HTML directly - no polling needed
Replacing Crawling Workflows
Apify's Request Queue and Actor orchestration map to Scrapfly's Crawler API.
Apify Crawling Pattern
from apify_client import ApifyClient
client = ApifyClient("YOUR_TOKEN")
run = client.actor("apify/cheerio-scraper").call({
"startUrls": [{"url": "https://example.com"}],
"maxRequestsPerCrawl": 100,
"linkSelector": "a[href]",
"pseudoUrls": [
{"purl": "https://example.com/products/[.*]"}
],
"pageFunction": """
async function pageFunction(context) {
const { request, $ } = context;
return {
url: request.url,
title: $('h1').text()
};
}
"""
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
Scrapfly Crawler API
from scrapfly import ScrapflyClient
client = ScrapflyClient(key="YOUR_SCRAPFLY_KEY")
# Create and start crawler
crawler = client.crawler.create({
"name": "example-crawler",
"start_urls": ["https://example.com"],
"max_depth": 3,
"max_pages": 100,
"allowed_domains": ["example.com"],
"url_patterns": ["/products/*"],
"scrape_config": {
"render_js": True,
"asp": True
},
"extraction_rules": {
"title": "h1::text"
}
})
# Results are stored and accessible via webhook or polling
# See Crawler API docs for result retrieval
For complex crawling workflows with link discovery, sitemap parsing, and per-URL extraction rules, see the Crawler API documentation.
🤖 AI Migration Assistant
Use Claude or ChatGPT to convert your Apify Actor code to Scrapfly. Copy this prompt with your existing code.
Copy This Prompt
I'm migrating from Apify Actors to Scrapfly's unified API. Here's my current code using Apify.
Please convert it to use Scrapfly's Python SDK (or JavaScript SDK if my code is in JavaScript).
Key differences to understand:
1. Apify uses Actors (specialized scrapers per site) → Scrapfly uses one unified API for all sites
2. Apify runs are async (start, poll, get dataset) → Scrapfly can be sync or async (direct response)
3. Apify has built-in storage (datasets) → Scrapfly returns data directly (store in your own DB)
4. Apify pageFunction (Puppeteer/Playwright) → Scrapfly js_scenario for browser actions
Common parameter mappings:
- `proxyConfiguration.useApifyProxy` → `asp=True` (anti-bot bypass with proxies)
- `proxyConfiguration.apifyProxyCountry` → `country` (lowercase, e.g., "us")
- Puppeteer page actions → `js_scenario` (click, fill, scroll, wait actions)
- `maxRequestsPerCrawl` → Control in your code, or use Crawler API with `max_pages`
- Dataset output → Use `extraction_model` or `extraction_prompt` for structured data
For structured data extraction:
- Instead of writing parsing logic in pageFunction, use Scrapfly's Extraction API
- `extraction_model="product"` for auto-extraction of product data
- `extraction_prompt="Extract title, price, description"` for custom fields
Scrapfly Python SDK Docs: https://scrapfly.io/docs/sdk/python?view=markdown
Scrapfly Scrape API Docs: https://scrapfly.io/docs/scrape-api/getting-started?view=markdown
Scrapfly Extraction API: https://scrapfly.io/docs/extraction-api/getting-started?view=markdown
Scrapfly Crawler API: https://scrapfly.io/docs/crawler-api/getting-started?view=markdown
My current Apify code:
[PASTE YOUR CODE HERE]
Pricing: Predictable vs Complex
One of the main reasons developers switch from Apify is unpredictable costs. Here's how pricing compares.
| Cost Component | Apify | Scrapfly |
|---|---|---|
| Base subscription | $39-$999/month (Starter to Business) | $30/month (Scale plan) |
| Compute units | $0.20-$0.30 per compute unit | None |
| Actor fees | Some Actors charge per result | None (unified API) |
| Proxy costs | Included in compute, but usage varies | Built into credit cost |
| Pricing model | Complex (subscription + CU + Actor fees) | Simple (fixed credits per request) |
Scraping 10,000 product pages with JavaScript rendering and anti-bot bypass:
- Apify: ~$39 subscription + 10,000 × 0.5 CU × $0.25 = $39 + $1,250 = ~$1,289/month (compute usage varies!)
- Scrapfly: 10,000 × 30 credits = 300,000 credits = ~$300/month (predictable)
Frequently Asked Questions
Do I need to rewrite all my parsing logic?
No. Your CSS selectors, XPath expressions, and data transformation code remain the same. You're only changing how you fetch the HTML. Scrapfly returns the same HTML that Apify Actors would return after JavaScript rendering.
# Your existing parsing code works as-is
from parsel import Selector
# Apify: Get HTML from Actor dataset item
# Scrapfly: Get HTML from result.content
sel = Selector(result.content) # Same selector library
title = sel.css("h1::text").get() # Same selectors
How do I handle Actor-specific features like datasets?
Scrapfly returns data directly in the API response. Store results in your preferred database:
- MongoDB: Store JSON responses directly
- PostgreSQL: Parse and insert structured data
- S3/Cloud Storage: Save responses as files
- Data pipelines: Stream to Kafka, Airflow, etc.
This approach gives you more flexibility and avoids vendor lock-in to Apify's storage.
What about Actor scheduling and orchestration?
Scrapfly focuses on scraping excellence. For scheduling and orchestration:
- Cron jobs: Simple scheduling for recurring scrapes
- Airflow/Dagster: Complex DAG-based workflows
- Cloud schedulers: AWS Lambda + EventBridge, GCP Cloud Scheduler
- Scrapfly Crawler API: For multi-page crawling with automatic link discovery
Using standard tools instead of proprietary Actor scheduling makes your infrastructure more portable.
Can I still use Puppeteer/Playwright-style page functions?
Scrapfly's js_scenario parameter replaces Puppeteer/Playwright page functions with a declarative JSON format:
# Instead of:
# await page.click('button.submit')
# await page.waitForSelector('.results')
# Use Scrapfly js_scenario:
js_scenario=[
{"click": {"selector": "button.submit"}},
{"wait_for_selector": {"selector": ".results"}}
]
For complex JavaScript execution, use the execute action to run custom JavaScript code.
How do I test my migration?
- Sign up for free: Get 1,000 API credits (no credit card)
- Test key URLs: Run Scrapfly on the same URLs as your Apify Actors
- Compare output: Verify HTML content matches what Actors return
- Test parsing: Confirm your selectors work on Scrapfly's output
- Gradual migration: Run both in parallel, then switch traffic progressively
Start Your Migration Today
One API for all websites. Predictable pricing. No compute unit surprises.
- 1,000 free API credits
- Unified API for any website
- 98% success on protected sites
- No Actor fees or compute charges
Need help with migration? Contact our team