Octoparse to Scrapfly Migration Guide
Migrate from Octoparse's desktop visual scraper to Scrapfly's cloud-native API. No software installation, real anti-bot bypass, and scraping from anywhere. Most teams complete migration in under 2 hours.
Understanding the Difference
Octoparse and Scrapfly have fundamentally different approaches. Understanding this shift makes migration easier.
Octoparse: Desktop Visual Scraper
- Install software on Windows/Mac
- Point-and-click visual builder
- Run scrapers on your local machine
- Limited cloud extraction credits
- Basic proxy rotation (no anti-bot bypass)
- Templates for popular sites
Limitation: Desktop browsers are easily detected by Cloudflare, DataDome, etc.
Scrapfly: Cloud-Native API
- No installation, just API calls
- SDK for Python, TypeScript, Go, Scrapy
- Run from anywhere (cloud, serverless, CI/CD)
- Unlimited concurrent requests
- Real anti-bot bypass (ASP technology)
- AI-powered data extraction
Advantage: Stealth browser fingerprints bypass what desktop scrapers can't
You're not recreating visual workflows. You're replacing point-and-click with simple API parameters. Your CSS selectors still work - they just move from Octoparse's UI to your code or Scrapfly's Extraction API.
Step-by-Step Migration
Step 1: Install the Scrapfly SDK
Replace Octoparse desktop software with a simple pip/npm install:
Python
pip install scrapfly-sdk
TypeScript/JavaScript
npm install scrapfly-sdk
No desktop app, no Windows/Mac requirements - runs anywhere Python or Node.js runs.
Step 2: Replace Visual Workflow with API Calls
Your Octoparse point-and-click becomes a simple API request:
Octoparse Workflow
# Octoparse Desktop Workflow:
# 1. Open Octoparse app
# 2. Enter URL
# 3. Click "Auto-detect web page data"
# 4. Adjust selectors in visual builder
# 5. Configure pagination
# 6. Set schedule (optional)
# 7. Run task locally or in cloud
# 8. Export to CSV/Excel
Scrapfly
from scrapfly import ScrapflyClient, ScrapeConfig
client = ScrapflyClient(key="YOUR_API_KEY")
# One API call replaces the entire workflow
result = client.scrape(ScrapeConfig(
url="https://web-scraping.dev/products",
render_js=True, # JavaScript rendering
asp=True, # Anti-bot bypass
country="us" # Geo-targeting
))
# Parse with CSS selectors (same as Octoparse)
from parsel import Selector
sel = Selector(result.content)
titles = sel.css("h2.product-title::text").getall()
Step 3: Use AI Extraction Instead of Manual Selectors
Replace Octoparse's point-and-click with Scrapfly's AI-powered extraction:
from scrapfly import ScrapflyClient, ScrapeConfig, ExtractionConfig
client = ScrapflyClient(key="YOUR_API_KEY")
# Scrape the page
result = client.scrape(ScrapeConfig(
url="https://web-scraping.dev/product/123",
render_js=True,
asp=True
))
# Auto-extract product data (no CSS selectors needed!)
extracted = client.extract(ExtractionConfig(
content=result.content,
content_type="text/html",
extraction_model="product" # AI detects product data automatically
))
# Structured data ready to use
print(extracted.data)
# {
# "name": "Product Name",
# "price": "$29.99",
# "description": "...",
# "images": [...],
# "availability": "In Stock"
# }
Available extraction models: product, article, review_list, job_posting, real_estate, and more. See full list.
Step 4: Replace Click Actions with JS Scenarios
Octoparse's click/scroll actions become declarative JS Scenarios:
Octoparse Actions
# Octoparse Visual Actions:
# - Click "Load More" button
# - Wait for content to load
# - Scroll to bottom
# - Fill search form
# - Submit form
Scrapfly JS Scenario
result = client.scrape(ScrapeConfig(
url="https://web-scraping.dev",
render_js=True,
asp=True,
js_scenario=[
{"click": {"selector": ".load-more-btn"}},
{"wait": 2000},
{"scroll": {"selector": "bottom"}},
{"fill": {
"selector": "#search",
"value": "keyword"
}},
{"click": {"selector": "#submit"}}
]
))
See complete JS Scenario documentation for all available actions.
Step 5: Replace Octoparse Scheduling
Octoparse's built-in scheduler becomes standard cron jobs or serverless functions:
Cron Job (Linux/Mac)
# Run every day at 9 AM
0 9 * * * python /path/to/scraper.py
GitHub Actions
on:
schedule:
- cron: '0 9 * * *' # Daily at 9 AM
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- run: python scraper.py
Works with any scheduler: cron, GitHub Actions, AWS Lambda, Google Cloud Functions, etc. No desktop required.
Octoparse Feature Mapping
| Octoparse Feature | Scrapfly Equivalent |
|---|---|
| Auto-detect page data | extraction_model="product" (or article, job_posting, etc.) |
| Point-and-click selectors | CSS/XPath selectors in your code, or AI extraction |
| Pagination handling | Loop in your code, or use Crawler API |
| AJAX / Infinite scroll | js_scenario=[{"scroll": {"selector": "bottom", "infinite": 5}}] |
| Click actions | js_scenario=[{"click": {"selector": ".button"}}] |
| Form filling | js_scenario=[{"fill": {"selector": "#input", "value": "text"}}] |
| Wait for element | wait_for_selector=".element" |
| Cloud extraction | All requests are cloud-native (default) |
| Proxy rotation | Built-in: proxy_pool="public_residential_pool" |
| Anti-block / IP rotation | asp=True (real anti-bot bypass, not just proxies) |
| Scheduled runs | Cron, GitHub Actions, AWS Lambda, or any scheduler |
| Export to CSV/Excel | pandas, csv module, or direct database write |
| Pre-built templates | AI extraction models (adapt automatically) |
Why Scrapfly Works Where Octoparse Fails
Octoparse users frequently report failures on Cloudflare-protected sites. Here's why:
Octoparse Approach
- Runs in local desktop browser
- Detectable browser fingerprint
- Basic proxy rotation (IP only)
- No challenge solving
- No TLS/HTTP2 fingerprint spoofing
Result: Cloudflare, DataDome, and PerimeterX block desktop scrapers easily
Scrapfly ASP
- Stealth browser in cloud
- Real browser fingerprints
- TLS/JA3 fingerprint matching
- Automatic challenge solving
- HTTP/2 fingerprint alignment
Result: 98% success rate on protected sites
Sites that fail in Octoparse often work instantly with Scrapfly. Get 1,000 free credits and test your targets.
Pricing Comparison
| Aspect | Octoparse | Scrapfly |
|---|---|---|
| Starting price | $99/month (Standard) | $30/month |
| Free tier | 10 tasks, 2 concurrent, limited features | 1,000 API credits |
| Cloud scraping | Limited credits (extra cost) | Included in all plans |
| Concurrent tasks | 2-20 depending on plan | 100-10,000 req/s |
| Anti-bot bypass | Proxy add-ons (extra cost) | Built into credit cost |
| Pricing model | Monthly + cloud credits + proxy add-ons | Simple credits per request |
Complete Migration Example
Here's a complete example migrating an Octoparse product scraping workflow to Scrapfly:
Complete Product Scraper (Python)
"""
Scrapfly Product Scraper
Replaces Octoparse point-and-click workflow with cloud-native API
"""
from scrapfly import ScrapflyClient, ScrapeConfig, ExtractionConfig
import json
# Initialize client (no desktop app needed!)
client = ScrapflyClient(key="YOUR_API_KEY")
def scrape_product_page(url: str) -> dict:
"""Scrape a single product page with anti-bot bypass"""
# Fetch the page with Cloudflare bypass
result = client.scrape(ScrapeConfig(
url=url,
render_js=True, # Render JavaScript (like Octoparse)
asp=True, # Anti-bot bypass (Octoparse can't do this)
country="us", # Geo-targeting
# Handle "Load More" button if needed:
js_scenario=[
{"wait_for_selector": {"selector": ".product-details"}},
{"click": {"selector": ".show-more", "ignore_if_not_visible": True}},
{"wait": 1000}
]
))
# Extract product data automatically (no manual selectors!)
extracted = client.extract(ExtractionConfig(
content=result.content,
content_type="text/html",
extraction_model="product"
))
return extracted.data
def scrape_product_listing(listing_url: str, max_pages: int = 5) -> list:
"""Scrape multiple products from listing page with pagination"""
products = []
for page in range(1, max_pages + 1):
url = f"{listing_url}?page={page}"
print(f"Scraping page {page}...")
result = client.scrape(ScrapeConfig(
url=url,
render_js=True,
asp=True
))
# Extract all products on this page
extracted = client.extract(ExtractionConfig(
content=result.content,
content_type="text/html",
extraction_model="product_listing"
))
if not extracted.data:
break # No more products
products.extend(extracted.data)
return products
# Run the scraper
if __name__ == "__main__":
# Single product
product = scrape_product_page("https://web-scraping.dev/product/123")
print(json.dumps(product, indent=2))
# Product listing (replaces Octoparse pagination)
products = scrape_product_listing("https://web-scraping.dev/products")
print(f"Scraped {len(products)} products")
# Export to JSON (replaces Octoparse CSV export)
with open("products.json", "w") as f:
json.dump(products, f, indent=2)
Frequently Asked Questions
Do I need to learn a new visual workflow builder?
No. Scrapfly replaces point-and-click with simple API calls. Your CSS selectors from Octoparse translate directly to Scrapfly:
# Octoparse: Point-and-click to select ".product-title"
# Scrapfly: Same selector in code
from parsel import Selector
sel = Selector(result.content)
title = sel.css(".product-title::text").get() # Same selector!
Or use Scrapfly's AI extraction models to skip selectors entirely.
What about Octoparse's auto-detect feature?
Scrapfly's Extraction API is the cloud-native equivalent of auto-detect:
- extraction_model="product" - Auto-extract product data (name, price, images, etc.)
- extraction_model="article" - Auto-extract article content
- extraction_prompt - Use AI prompts for custom data extraction
Unlike Octoparse's auto-detect, Scrapfly's AI models work reliably across different sites without manual adjustment.
Can I run scrapers from my local machine?
Yes, but better - Scrapfly runs anywhere Python or Node.js runs:
- Local development: Run scripts from your laptop (no desktop app needed)
- Cloud servers: Run from AWS, GCP, Azure, or any VPS
- Serverless: Run in Lambda, Cloud Functions, or Vercel
- CI/CD: Run in GitHub Actions, GitLab CI, etc.
Unlike Octoparse, you're not tied to a specific desktop application.
How do I handle pagination like Octoparse?
Pagination is handled in your code or with JS Scenarios:
# Simple loop for pagination
for page in range(1, 11):
result = client.scrape(ScrapeConfig(
url=f"https://web-scraping.dev/products?page={page}",
render_js=True, asp=True
))
# Process results...
# Or use JS Scenario for infinite scroll
js_scenario=[{"scroll": {"selector": "bottom", "infinite": 5}}]
For complex crawling, use the Crawler API.
What about Octoparse cloud extraction credits?
Scrapfly replaces cloud extraction credits with a simpler model:
- Pay per request: Credits only consumed when scraping
- No desktop limits: Every request is cloud-native
- No proxy add-ons: Anti-bot bypass included in credit cost
- Transparent pricing: Know exactly what each request costs
Start with 1,000 free credits to test your existing targets.
Ready to Migrate from Octoparse?
Test Scrapfly on your blocked URLs with 1,000 free credits.
- No desktop installation
- 98% success on protected sites
- AI-powered data extraction
- In-house technical support