Octoparse to Scrapfly Migration Guide

Understanding the Difference

Octoparse and Scrapfly have fundamentally different approaches. Understanding this shift makes migration easier.

Octoparse: Desktop Visual Scraper

Install software on Windows/Mac
Point-and-click visual builder
Run scrapers on your local machine
Limited cloud extraction credits
Basic proxy rotation (no anti-bot bypass)
Templates for popular sites

Limitation: Desktop browsers are easily detected by Cloudflare, DataDome, etc.

Scrapfly: Cloud-Native API

No installation, just API calls
SDK for Python, TypeScript, Go, Scrapy
Run from anywhere (cloud, serverless, CI/CD)
Unlimited concurrent requests
Real anti-bot bypass (ASP technology)
AI-powered data extraction

Advantage: Stealth browser fingerprints bypass what desktop scrapers can't

Key Migration Insight:

You're not recreating visual workflows. You're replacing point-and-click with simple API parameters. Your CSS selectors still work - they just move from Octoparse's UI to your code or Scrapfly's Extraction API.

Step-by-Step Migration

Step 1: Install the Scrapfly SDK

Replace Octoparse desktop software with a simple pip/npm install:

Python

pip install scrapfly-sdk

TypeScript/JavaScript

npm install scrapfly-sdk

No desktop app, no Windows/Mac requirements - runs anywhere Python or Node.js runs.

Step 2: Replace Visual Workflow with API Calls

Your Octoparse point-and-click becomes a simple API request:

Octoparse Workflow

# Octoparse Desktop Workflow:
# 1. Open Octoparse app
# 2. Enter URL
# 3. Click "Auto-detect web page data"
# 4. Adjust selectors in visual builder
# 5. Configure pagination
# 6. Set schedule (optional)
# 7. Run task locally or in cloud
# 8. Export to CSV/Excel

Scrapfly

from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key="YOUR_API_KEY")

# One API call replaces the entire workflow
result = client.scrape(ScrapeConfig(
    url="https://web-scraping.dev/products",
    render_js=True,  # JavaScript rendering
    asp=True,        # Anti-bot bypass
    country="us"     # Geo-targeting
))

# Parse with CSS selectors (same as Octoparse)
from parsel import Selector
sel = Selector(result.content)
titles = sel.css("h2.product-title::text").getall()

Step 3: Use AI Extraction Instead of Manual Selectors

Replace Octoparse's point-and-click with Scrapfly's AI-powered extraction:

from scrapfly import ScrapflyClient, ScrapeConfig, ExtractionConfig

client = ScrapflyClient(key="YOUR_API_KEY")

# Scrape the page
result = client.scrape(ScrapeConfig(
    url="https://web-scraping.dev/product/123",
    render_js=True,
    asp=True
))

# Auto-extract product data (no CSS selectors needed!)
extracted = client.extract(ExtractionConfig(
    content=result.content,
    content_type="text/html",
    extraction_model="product"  # AI detects product data automatically
))

# Structured data ready to use
print(extracted.data)
# {
#   "name": "Product Name",
#   "price": "$29.99",
#   "description": "...",
#   "images": [...],
#   "availability": "In Stock"
# }

Available extraction models: product, article, review_list, job_posting, real_estate, and more. See full list.

Step 4: Replace Click Actions with JS Scenarios

Octoparse's click/scroll actions become declarative JS Scenarios:

Octoparse Actions

# Octoparse Visual Actions:
# - Click "Load More" button
# - Wait for content to load
# - Scroll to bottom
# - Fill search form
# - Submit form

Scrapfly JS Scenario

result = client.scrape(ScrapeConfig(
    url="https://web-scraping.dev",
    render_js=True,
    asp=True,
    js_scenario=[
        {"click": {"selector": ".load-more-btn"}},
        {"wait": 2000},
        {"scroll": {"selector": "bottom"}},
        {"fill": {
            "selector": "#search",
            "value": "keyword"
        }},
        {"click": {"selector": "#submit"}}
    ]
))

See complete JS Scenario documentation for all available actions.

Step 5: Replace Octoparse Scheduling

Octoparse's built-in scheduler becomes standard cron jobs or serverless functions:

Cron Job (Linux/Mac)

# Run every day at 9 AM
0 9 * * * python /path/to/scraper.py

GitHub Actions

on:
  schedule:
    - cron: '0 9 * * *'  # Daily at 9 AM
jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - run: python scraper.py

Works with any scheduler: cron, GitHub Actions, AWS Lambda, Google Cloud Functions, etc. No desktop required.

Octoparse Feature Mapping

Octoparse Feature	Scrapfly Equivalent
Auto-detect page data	`extraction_model="product"` (or article, job_posting, etc.)
Point-and-click selectors	CSS/XPath selectors in your code, or AI extraction
Pagination handling	Loop in your code, or use Crawler API
AJAX / Infinite scroll	`js_scenario=[{"scroll": {"selector": "bottom", "infinite": 5}}]`
Click actions	`js_scenario=[{"click": {"selector": ".button"}}]`
Form filling	`js_scenario=[{"fill": {"selector": "#input", "value": "text"}}]`
Wait for element	`wait_for_selector=".element"`
Cloud extraction	All requests are cloud-native (default)
Proxy rotation	Built-in: `proxy_pool="public_residential_pool"`
Anti-block / IP rotation	`asp=True` (real anti-bot bypass, not just proxies)
Scheduled runs	Cron, GitHub Actions, AWS Lambda, or any scheduler
Export to CSV/Excel	pandas, csv module, or direct database write
Pre-built templates	AI extraction models (adapt automatically)

Why Scrapfly Works Where Octoparse Fails

Octoparse users frequently report failures on Cloudflare-protected sites. Here's why:

Octoparse Approach

Runs in local desktop browser
Detectable browser fingerprint
Basic proxy rotation (IP only)
No challenge solving
No TLS/HTTP2 fingerprint spoofing

Result: Cloudflare, DataDome, and PerimeterX block desktop scrapers easily

Scrapfly ASP

Stealth browser in cloud
Real browser fingerprints
TLS/JA3 fingerprint matching
Automatic challenge solving
HTTP/2 fingerprint alignment

Result: 98% success rate on protected sites

Test Your Blocked URLs:

Sites that fail in Octoparse often work instantly with Scrapfly. Get 1,000 free credits and test your targets.

Pricing Comparison

Aspect	Octoparse	Scrapfly
Starting price	$99/month (Standard)	$30/month
Free tier	10 tasks, 2 concurrent, limited features	1,000 API credits
Cloud scraping	Limited credits (extra cost)	Included in all plans
Concurrent tasks	2-20 depending on plan	100-10,000 req/s
Anti-bot bypass	Proxy add-ons (extra cost)	Built into credit cost
Pricing model	Monthly + cloud credits + proxy add-ons	Simple credits per request

Complete Migration Example

Here's a complete example migrating an Octoparse product scraping workflow to Scrapfly:

Complete Product Scraper (Python)

"""
Scrapfly Product Scraper
Replaces Octoparse point-and-click workflow with cloud-native API
"""
from scrapfly import ScrapflyClient, ScrapeConfig, ExtractionConfig
import json

# Initialize client (no desktop app needed!)
client = ScrapflyClient(key="YOUR_API_KEY")

def scrape_product_page(url: str) -> dict:
    """Scrape a single product page with anti-bot bypass"""

    # Fetch the page with Cloudflare bypass
    result = client.scrape(ScrapeConfig(
        url=url,
        render_js=True,      # Render JavaScript (like Octoparse)
        asp=True,            # Anti-bot bypass (Octoparse can't do this)
        country="us",        # Geo-targeting
        # Handle "Load More" button if needed:
        js_scenario=[
            {"wait_for_selector": {"selector": ".product-details"}},
            {"click": {"selector": ".show-more", "ignore_if_not_visible": True}},
            {"wait": 1000}
        ]
    ))

    # Extract product data automatically (no manual selectors!)
    extracted = client.extract(ExtractionConfig(
        content=result.content,
        content_type="text/html",
        extraction_model="product"
    ))

    return extracted.data

def scrape_product_listing(listing_url: str, max_pages: int = 5) -> list:
    """Scrape multiple products from listing page with pagination"""

    products = []

    for page in range(1, max_pages + 1):
        url = f"{listing_url}?page={page}"
        print(f"Scraping page {page}...")

        result = client.scrape(ScrapeConfig(
            url=url,
            render_js=True,
            asp=True
        ))

        # Extract all products on this page
        extracted = client.extract(ExtractionConfig(
            content=result.content,
            content_type="text/html",
            extraction_model="product_listing"
        ))

        if not extracted.data:
            break  # No more products

        products.extend(extracted.data)

    return products

# Run the scraper
if __name__ == "__main__":
    # Single product
    product = scrape_product_page("https://web-scraping.dev/product/123")
    print(json.dumps(product, indent=2))

    # Product listing (replaces Octoparse pagination)
    products = scrape_product_listing("https://web-scraping.dev/products")
    print(f"Scraped {len(products)} products")

    # Export to JSON (replaces Octoparse CSV export)
    with open("products.json", "w") as f:
        json.dump(products, f, indent=2)

Frequently Asked Questions

Do I need to learn a new visual workflow builder?

No. Scrapfly replaces point-and-click with simple API calls. Your CSS selectors from Octoparse translate directly to Scrapfly:

# Octoparse: Point-and-click to select ".product-title"
# Scrapfly: Same selector in code

from parsel import Selector
sel = Selector(result.content)
title = sel.css(".product-title::text").get()  # Same selector!

Or use Scrapfly's AI extraction models to skip selectors entirely.

What about Octoparse's auto-detect feature?

Scrapfly's Extraction API is the cloud-native equivalent of auto-detect:

extraction_model="product" - Auto-extract product data (name, price, images, etc.)
extraction_model="article" - Auto-extract article content
extraction_prompt - Use AI prompts for custom data extraction

Unlike Octoparse's auto-detect, Scrapfly's AI models work reliably across different sites without manual adjustment.

Can I run scrapers from my local machine?

Yes, but better - Scrapfly runs anywhere Python or Node.js runs:

Local development: Run scripts from your laptop (no desktop app needed)
Cloud servers: Run from AWS, GCP, Azure, or any VPS
Serverless: Run in Lambda, Cloud Functions, or Vercel
CI/CD: Run in GitHub Actions, GitLab CI, etc.

Unlike Octoparse, you're not tied to a specific desktop application.

How do I handle pagination like Octoparse?

Pagination is handled in your code or with JS Scenarios:

# Simple loop for pagination
for page in range(1, 11):
    result = client.scrape(ScrapeConfig(
        url=f"https://web-scraping.dev/products?page={page}",
        render_js=True, asp=True
    ))
    # Process results...

# Or use JS Scenario for infinite scroll
js_scenario=[{"scroll": {"selector": "bottom", "infinite": 5}}]

For complex crawling, use the Crawler API.

What about Octoparse cloud extraction credits?

Scrapfly replaces cloud extraction credits with a simpler model:

Pay per request: Credits only consumed when scraping
No desktop limits: Every request is cloud-native
No proxy add-ons: Anti-bot bypass included in credit cost
Transparent pricing: Know exactly what each request costs

Start with 1,000 free credits to test your existing targets.

Products

Features

SDKs

No-Code Platforms

LLM & RAG Apps

Technical Challenges

Popular Targets

Real Estate

eCommerce

Social Media

Company & Reviews

Jobs

Search & SEO

Fashion

Travel & Hotels

Industry Solutions