Scrapfly Product Release Notes
Subscribe to RSS FeedStay Updated! Subscribe to our RSS feed to automatically receive notifications about new features, bug fixes, and product updates.
2026-04-27
Scheduler
Schedule any scrape, screenshot, or crawl to fire later. Run once at a given date, or on a recurring cron. Manage everything from a single Scheduler dashboard with execution history, pause/resume, and bulk actions. Scheduling is now a first-class part of every customer-facing API. From the player of the Web Scraping API, Screenshot API, or Crawler API, the new Schedule button captures the exact config you just tested and turns it into a recurring (or one-shot) job that fires through your webhook on the cadence you pick.
What's included:
- Unified Scheduler dashboard at Scheduler. Every scheduled job across Web Scraping, Screenshot, and Crawler APIs in one filterable list, with per-job status, last 25 fires reliability ribbon, success rate, and "next fire in N minutes" countdown.
- Tabbed recurrence picker. Pick a Simple cadence (every N minutes / hours / days, with optional day-of-week filter) or drop into Cron mode for arbitrary patterns. Both modes share the same end-condition controls (run forever, end on date, or stop after N fires).
- Execution flags. Concurrency control (skip overlapping fires or allow them), retry on failure with a configurable budget, and pause/resume at any time without losing history.
- Schedule from a player. Every API player has a "Schedule this run" action that pre-fills the form with the request you're previewing. No copy-pasting config between dashboards.
- Per-row triggers. Fire any active schedule manually from the list with one click. The run goes through the same pipeline as a natural cron fire.
- Bulk actions. Pause, resume, or cancel selected schedules in a single click.
- Execution history. The last 25 fires per schedule rendered as a status ribbon. Click into a schedule to see the full audit trail (fired at, status, error, dispatched config).
- Webhook lifecycle integration. Disabling a webhook automatically pauses every schedule bound to it. Re-enabling it offers to resume them in bulk.
Per-API documentation:
2026-04-25
Extraction API
Extraction API Monitoring dashboard. A new dashboard surfaces your Extraction API analytics alongside the rest of your Scrapfly monitoring. See success rate, average duration, credits spent, and the size of documents processed at a glance, plus a paginated list of every recent extraction with its URL, model, origin, and status.
What's included:
- KPI cards for the selected window: total extractions with success/error split, average duration per request, API credits, and total document size processed (KB / MB / GB).
- Usage Over Time chart with toggles for Extractions and Credits, plus an Extraction Model donut so you can see at a glance which extraction strategy you're using most.
- Top Origins and Top Domains breakdowns. Clearly distinguish extractions triggered
from the dedicated
/extractionendpoint versus inlineextraction_model=...parameters on Web Scraping API calls. - Recent Extractions list with Time, URL, extraction model, origin, project / env, document size, credits, duration, and status. Use the Load More button to page through more rows.
- Time windows from Last 15 minutes through Last month, plus the current Subscription Period for billing-aligned views.
2026-04-23
Cloud Browser API
Automatic captcha solving. Add solve_captcha=true and Cloud Browser handles Turnstile,
DataDome, reCAPTCHA, GeeTest, PerimeterX, and puzzle captchas for you.
Cloud Browser now ships with a built-in captcha detector and solver as part of the Antibot
CDP domain. When enabled, every challenge that appears on the page is solved autonomously —
your code only observes the outcome through four structured CDP events
(captchaDetected, captchaSolvingStarted, captchaSolved, captchaError).
Supported captcha types and pricing:
- Interaction-based (5 credits/solve):
turnstile_checkbox(Cloudflare Turnstile checkbox),datadome_slider(DataDome slider puzzle),perimeterx_hold(PerimeterX click-and-hold). Solved in-browser with human-like movement — no upstream fees. - Token-based (20 credits/solve):
turnstile(invisible Turnstile),recaptcha(Google reCAPTCHA v2/v3),geetest, and puzzle-click captchas. Scrapfly orchestrates the token flow, you just observecaptchaSolved. - Blocked (not billed):
datadome(full-page interstitial). This is an IP ban, not a challenge — the only recovery is a fresh proxy. Detection events fire for telemetry, but no solve is attempted and no credits are charged.
Enabling the solver:
wss://browser.scrapfly.io?api_key=KEY&solve_captcha=true
Works the same way from every SDK, the dashboard playground, and the CLI
(scrapfly browser --solve-captcha). Failures are free — a captchaError costs nothing.
Per-run billing detail (count and credits by captcha type) is visible on each session's
Cost Breakdown in the monitoring dashboard.
2026-04-21
Security & Compliance
SOC 3 report now publicly available, alongside ISO 27001, SOC 2 Type II, and GDPR. Scrapfly has completed its SOC 3 attestation, the public summary of our SOC 2 Type II audit. The report covers the same controls and period as the confidential SOC 2 Type II, and it is distributable without NDA.
What's available:
- SOC 3 report (public): downloadable as a PDF from the Scrapfly compliance page, no account or NDA required.
- SOC 2 Type II report (confidential): available on request via the Scrapfly trust portal.
- ISO 27001 certificate (public): downloadable PDF.
- GDPR compliance statement (public): downloadable PDF, with SCCs Module 2 embedded in the DPA for EEA/US transfers (Enterprise tier).
The badges are now visible on the Scrapfly footer and signup page as trust signals for security-conscious buyers.
2026-04-19
Web Scraping API
Batch Scraping API: scrape up to 100 URLs per request with streaming results.
A new POST /scrape/batch endpoint accepts an array of scrape configurations and streams each
result back as soon as it is ready using multipart/mixed framing. Clients can start consuming
results at the speed of the fastest scrape in the batch rather than waiting on the slowest.
Key properties:
- Up to 100 configs per batch (10 MiB max body). Each config is a flat dictionary of the
same query parameters that
/scrapeaccepts. - Atomic concurrency reservation: if your account's remaining concurrency is less than
the number of non-webhook configs in the batch, the whole batch is rejected up front
(
ERR::SCRAPE::BATCH_CONFIG, HTTP 429). No partial successes to reconcile. - Per-part streaming: each scrape result is flushed to the wire as an individual
multipart part the moment it completes, with its own
X-Scrapfly-Correlation-IdandX-Scrapfly-Scrape-Statusheaders. Envelope-level gzip/zstd compression preserves the streaming invariant. - Required per-entry
correlation_id: parts arrive out of order, so every config must carry a unique correlation ID. This is how SDKs match streamed parts back to their originating input. - Per-part errors don't fail the batch: a single upstream-403 or config error emits its own error-envelope part; other scrapes in the batch complete normally.
- Webhook configs in a batch: webhook scrapes are enqueued immediately and their part in the response is an HTTP 202 acknowledgement; the scrape result still arrives via your webhook endpoint.
- Same result shape as single
/scrape: each part body is identical in shape to a single-scrape response, so your SDK reuses its existing decoder.
SDK support: every Scrapfly SDK (Python, TypeScript, Go, Rust) exposes scrape_batch()
(or scrapeBatch / ScrapeBatch) which handles the multipart streaming and yields
(correlation_id, result) tuples as parts arrive.
The Batch API is available on all paid plans. It is not available on the FREE plan.
Classify API: block detection without a live scrape. A new POST /classify endpoint
takes an HTTP response you already have (URL, status code, headers, body) and returns a
verdict on whether the target blocked the request.
The response is a minimal three-field payload: blocked (boolean), antibot
(the name of the anti-bot product if one matched: cloudflare, datadome, akamai,
perimeterx, kasada, imperva, aws_waf, f5_shape, ...), and cost (API credits
charged for the call).
Key properties:
- Authoritative block detection: runs the same detection pipeline that powers every live Scrapfly scrape. New anti-bot products light up here automatically without a client upgrade.
- Fast: no proxy allocation, no browser startup, no live fetch. A single classify call typically resolves in a few milliseconds.
- 1 API credit per call, billed under the Web Scraping API product (same bucket
as
/scrape). Errors (4xx/5xx) are not billed.
MCP support: the Scrapfly MCP Server's check_if_blocked tool now uses this
endpoint under the hood, so AI agents get a continuously-tuned answer instead of a
static local heuristic.
2026-04-16
Cloud Browser API
Native Browser MCP. WebMCP support. Cloud Browser now supports Chromium's built-in
WebMCP standard. When enable_mcp=true,
the browser exposes tools via the WebMCP CDP domain:
- 18 Antibot interaction tools: clickOn, fill, typeText, scroll, hover, pressKey, selectOption, dragAndDrop, etc. using human-like interaction patterns
- Website-registered tools: any tools a website exposes via
navigator.modelContext(search, addToCart, getFilters, etc.) are automatically discovered and callable
AI agents connect to the streamable-HTTP MCP endpoint and call tools directly, no Playwright or Puppeteer needed.
Add enable_mcp=true to your WebSocket connection URL:
wss://browser.scrapfly.io?api_key=KEY&enable_mcp=true
MCP Server
Cloud Browser tools + dynamic WebMCP tool registration. The Scrapfly MCP Server now includes full Cloud Browser support with 8 new tools:
cloud_browser_open: Open a real browser on any URL. Automatically discovers and registers the browser's WebMCP tools (clickOn, fill, typeText, scroll, etc.) as first-class MCP tools your AI agent can call directlycloud_browser_screenshot: Capture the current browser page as an imagecloud_browser_snapshot: Read the page text content (title, URL, body)cloud_browser_eval: Execute JavaScript in the browser pagecloud_browser_performance: Get detailed page performance metrics (TTFB, FCP, DOM timing, resource count)cloud_browser_navigate: Navigate to a new URL within the sessioncloud_browser_close: Close the session and release resourcescloud_browser_sessions: List active browser sessions
How it works: When you call cloud_browser_open, the MCP server connects to a remote
Chrome browser, discovers all available WebMCP tools via the CDP WebMCP domain, and dynamically
registers them on the MCP server. Your AI agent sees them in tools/list and calls them like
any other tool, the server proxies each call to Chrome via CDP.
2026-04-15
Scrapium Browser
Scrapium Browser v147 released (Chromium 147). This version tracks upstream Chromium 147, keeping our managed browser fleet aligned with the latest stealth, performance, and platform improvements shipped in real Chrome.
Key updates:
- Chromium 147 base: Latest engine, V8, and rendering pipeline matching real-world Chrome 147 fingerprints
- Updated CDP schema: The Chrome DevTools Protocol shipped with this build is published in our DevTools Protocol reference, including our custom
Antibotdomain - Refreshed browser-channel data: Brave, Edge, and Opera spoof profiles regenerated from upstream releases for the new major version
Automatically available on all services: No configuration required:
- Web Scraping API
- Screenshot API
- Crawler API
- Cloud Browser Service
API Keys
You can now restrict an API key to a subset of Scrapfly products. Available from your project page, under the Security & Access tab, use the new Product authorization multi-select to pick which products each key can call. The LIVE and TEST keys are configured independently. If nothing is selected, the key is authorized to call all products (existing behaviour).
2026-04-08
Cloud Browser API
Cloud Browser API is now generally available. Connect your existing Playwright, Puppeteer or Selenium code to managed, stealth Chrome instances over CDP, no infrastructure to run, no fingerprints to maintain.
Key features:
- Drop-in CDP endpoint: Point your automation library at our WebSocket endpoint and keep your existing code
- Unblock mode: One HTTP POST to
/unblockreturns the fully-rendered page plus a reusable CDP session, perfect for HTTP clients - Residential & datacenter proxies: Pick a
proxy_poolandcountryper session with no extra setup - Session resume: Reuse a successful bypass session across multiple requests to the same domain for faster follow-up calls
- Live observability: Inspect sessions, replay traces and review every CDP command from the dashboard
- SDK support: First-class helpers in the Python, TypeScript and Go SDKs
2026-02-20
Web Scraping API
New select JavaScript Scenario action. Select options from dropdown menus as part of your browser automation scenarios.
Supports both native HTML <select> elements and custom dropdown widgets (React Select, Material UI, etc.)
with real human-like interaction.
Supported selection methods:
- Native
<select>by value:{"select": {"selector": "select#country", "value": "US"}} - Custom dropdown by CSS attribute:
{"select": {"selector": ".dropdown", "option_selector": ".option[value='US']"}} - Custom dropdown by visible text:
{"select": {"selector": ".dropdown", "option_selector": ".item", "text": "Germany"}} - Custom dropdown by index:
{"select": {"selector": ".dropdown", "option_selector": "[role='option']", "index": 2}}
2026-02-18
Scrapium Browser
Scrapium Browser v145 released (Chromium 145). This version introduces a new Antibot CDP domain for browser automation and human simulation, delivering improved stealth and significantly fewer errors during automation operations.
Key improvements:
- New Antibot CDP Domain: A new way to automate and simulate human interactions directly within the browser process, enabling lower-latency and more reliable execution
- Cross-Frame Support: Automation commands now work seamlessly across iframes and nested frames, eliminating common errors when interacting with cross-origin content
- Better Humanization: Improved human simulation with more natural mouse movements, clicks, scrolling, and keyboard input
- Multi-Brand Browser Spoofing (Private Beta): Spoof browser identity as Brave, Microsoft Edge, or Opera in addition to Chrome for enhanced fingerprint diversity
Automatically available on all services: No configuration required:
- Web Scraping API
- Screenshot API
- Crawler API
- Cloud Browser Service
2026-02-14
Web Scraping API
Proxy Mode now available. Use Scrapfly as a standard HTTP/HTTPS forward proxy with any third-party tool that supports proxy configuration. SEO crawlers (Screaming Frog), automation platforms (Apify, Crawlee), monitoring tools, and data pipelines.
Key features:
- Drop-in proxy URL: Configure any tool's proxy settings with Scrapfly's endpoint, no SDK needed
- Options in username: Encode scraping options (country, ASP, JS rendering, etc.) as dash-separated key-value pairs in the proxy username
- Rich API options: JavaScript rendering, anti-bot protection, proxy rotation, caching, extraction, and more
- HTTP and HTTP/2 transport: Supports both HTTP/1.1 and HTTP/2 as proxy transport protocols
- Standard HTTP proxy protocol: Works with HTTP and HTTPS CONNECT methods
- Response metadata:
X-Scrapfly-*headers provide cost, timing, and error details alongside the target response - API Player integration: Generate proxy URLs directly from the dashboard
2026-01-19
Scrapium Browser
Scrapium Browser v144 released with HTTP/3, QUIC, and UDP support. This version matches Chromium 144 and introduces next-generation protocol support for enhanced stealth and fingerprint accuracy.
Key improvements:
- HTTP/3 Protocol Support - Modern web protocol over QUIC for improved performance and stealth
- QUIC Transport Layer - Low-latency multiplexed connections with better congestion control
- UDP Proxy Support - Full support for HTTP/3/QUIC/UDP proxy connections
HTTP/3 and QUIC provide significant advantages for web scraping by matching real browser fingerprints more accurately. Modern browsers like Chrome, Firefox, and Safari all support HTTP/3, making it essential for avoiding detection on HTTP/3-enabled websites.
Automatically available on all services - No configuration required:
- Web Scraping API
- Screenshot API
- Crawler API
- Cloud Browser Service
Web Scraping API
HTTP/3, QUIC, and UDP protocols now automatically enabled. Our infrastructure has been upgraded to support next-generation web protocols for enhanced fingerprint accuracy and stealth.
What's new:
- Curlium HTTP/3 Support - HTTP client library upgraded with HTTP/3, QUIC, and UDP protocol support
- Proxy Network HTTP/3 - Residential, datacenter, and mobile proxies now support HTTP/3/QUIC/UDP
- Automatic Protocol Negotiation - Seamless fallback to HTTP/2 or HTTP/1.1 when needed
- Zero Configuration - Available automatically on Web Scraping API, Screenshot API, Crawler API, and Cloud Browser Service
Why HTTP/3 matters for stealth:
HTTP/3 and QUIC enable better fingerprint matching with real browsers and improved performance through reduced latency and connection overhead. Since HTTP/3 is increasingly adopted by major websites and CDNs, using these protocols helps your requests blend in with legitimate traffic, reducing the risk of detection on websites that analyze protocol-level fingerprints.
All proxy types now support these protocols without any additional configuration.
2026-01-07
Python SDK
Python SDK 0.8.24 released.
Update instructions:
- Install specific version:
pip install scrapfly-sdk==0.8.24 - Upgrade to latest:
pip install --upgrade scrapfly-sdk
2026-01-04
Antibot Detector
Antibot Detector 2.4 released.
- Fix badge/popup desync - Popup now correctly shows detection results when badge displays count
- Root cause: Badge updates early (in processDetectionData), but cache write happens later (in finalizeDetection)
- Fix: Check badge count and use
state.mainDatadirectly without waiting forstate.expiry - Fix "Extension context invalidated" errors - Logger now gracefully handles extension reload
- Fix memory leaks - Clean up
finalizationDebounceandbatchProcessingFlagsMaps on tab close/URL change
The Chrome extension for detecting antibot protection has been updated.
2025-12-22
Screenshot API
Vision Deficiency Simulation now available for accessibility testing. Capture screenshots that simulate how web pages appear to users with various visual impairments, enabling WCAG, ADA, and Section 508 compliance testing.
Supported vision deficiency types:
- Deuteranopia - Red-green color blindness (green-blind), affects ~6% of males
- Protanopia - Red-green color blindness (red-blind), affects ~2% of males
- Tritanopia - Blue-yellow color blindness
- Achromatopsia - Complete color blindness (monochromacy)
- Blurred Vision - Simulates uncorrected refractive errors
Use the vision_deficiency parameter to test your pages for accessibility compliance.
Dashboard
Screenshot API Visual Player now available. A dedicated interactive playground for the Screenshot API, allowing you to configure, test, and preview screenshot captures directly in your browser.
Key features:
- Live Preview - See screenshot results instantly with full-size image preview
- All Parameters - Configure capture mode (fullpage, viewport, element selector), resolution, format (JPG, PNG, WebP, GIF), and rendering options
- Visual Options - Dark mode, banner blocking, print media format, and image loading controls
- Cost Estimation - Real-time API credit cost estimation with bandwidth warnings
- Code Snippets - Auto-generated code snippets for Python, TypeScript, Go, and cURL
- Webhook Support - Test asynchronous screenshot requests with webhook integration
- Log Replay - Replay previous screenshot requests directly from the monitoring log
2025-12-19
Extraction API
Enhanced AI Extraction with Structured Output. Our AI-powered extraction engine has been significantly upgraded to deliver more accurate and reliable structured data.
Key improvements:
- Guaranteed Schema Compliance - Extracted data now strictly adheres to the defined schema structure, eliminating malformed outputs
- Smarter Computed Fields - Enhanced intelligence for derived fields like sentiment analysis, date formatting, and currency normalization
- Automatic Currency Detection - Currency symbols are now automatically converted to ISO3 codes (e.g., $ → USD, € → EUR, £ → GBP)
- Intelligent Date Parsing - Raw dates in any format are automatically normalized to standard YYYY-MM-DD format
- Improved Sentiment Analysis - More accurate sentiment detection with confidence scores for review extraction
These improvements apply to all extraction schemas including Product, Article, Review, Real Estate, Job Posting, and more.
2025-12-16
MCP Server
MCP Server 1.0.9 released.
The Scrapfly MCP (Model Context Protocol) server has been updated.
2025-12-13
Scrapium Browser
Scrapium Browser v143 released. This version match with Chromium 143 and continue to improve antibot detection and scraping capabilities.
Following services are automatically upgraded:
- Cloud Browser Service
- Web Scraping API
- Screenshot API
2025-12-10
MCP Server
Scrapfly MCP Server now available for self-hosting. You can now run the Scrapfly MCP (Model Context Protocol) server on your own infrastructure for enhanced privacy and control.
The self-hosted MCP server provides the same powerful web scraping capabilities through a standardized protocol, allowing seamless integration with AI assistants and automation tools.
Key features:
- Full control over your scraping infrastructure
- Compatible with Claude, GPT, and other MCP-enabled AI assistants
- Easy Docker deployment
- All Scrapfly API features available
Go SDK
Go SDK officially released. Scrapfly now offers a native Go SDK for seamless integration with your Go applications.
The Go SDK provides a clean, idiomatic API for accessing all Scrapfly services including Web Scraping API, Screenshot API, and Extraction API.
Key features:
- Full Web Scraping API support
- Screenshot API integration
- Extraction API support
- Async and concurrent scraping
- Comprehensive error handling
- Well-documented with examples
Install with go get github.com/scrapfly/go-scrapfly
2025-11-27
Web Scraping API
Proxified Response now transparently handles CLOB and BLOB content. When using proxified_response=true,
large content (over 5MB) that would normally be stored as CLOB (text) or BLOB (binary) is now streamed
directly to you without any additional API calls.
Previously, large responses required a separate request to retrieve the content. Now the API automatically streams the content transparently, making integration simpler for large files like PDFs, images, and datasets.
Learn more in the Web Scraping API documentation.
Dashboard
Proxified Response mode available in API Player. Test the proxified_response parameter directly in the dashboard.
2025-11-26
Go SDK
Go SDK 0.1.0 released.
Update instructions:
- Install:
go get github.com/scrapfly/go-scrapfly@v0.1.0
2025-11-17
Crawler API
Crawler API released in EARLY ACCESS. This powerful API allows you to crawl entire websites with advanced configuration options.
The Crawler API is perfect for discovering and scraping content at scale across multiple pages, with intelligent depth control and filtering capabilities.
Key features:
- Configurable crawl depth and page limits
- URL filtering with include/exclude path patterns
- Automatic sitemap and robots.txt processing
- Multiple content format support (HTML, Markdown, Text, Clean HTML)
- Built-in extraction rules for structured data
- Real-time webhook notifications
- Content caching and TTL configuration
- Batch content retrieval with multipart responses
- Export and import crawler configurations
Discover the Crawler API documentation and start crawling in your dashboard.
Dashboard
Extraction API Player added to Dashboard. Test and debug extraction rules on already-scraped content directly in your browser.
The Extraction API Player is perfect for replaying extraction on cached content, data lake documents, or previously scraped pages without re-scraping the target website.
Key features:
- Test extraction on custom HTML/text content
- Try all extraction methods (AI Model, LLM Prompt, Manual Template)
- Pre-configured examples for quick start
- Live preview of extracted data
- Code snippets in multiple languages
- Load content directly from crawler results
- Export and share extraction configurations
Access the Extraction API Player and learn more in the Extraction API documentation.
2025-11-05
Web Scraping API
Browser Downloads & File Attachments now supported. When Javascript Rendering is enabled, Scrapfly browsers now automatically capture files downloaded during browser interactions.
This powerful feature allows you to retrieve documents, PDFs, spreadsheets, and other file attachments that are triggered by button clicks, form submissions, or other browser interactions.
Downloaded files are automatically captured and stored on Scrapfly's servers, with metadata and download URLs
available in the API response under result.browser_data.attachments.
Key features:
- Automatic capture of all browser downloads
- Support for all file types (PDFs, spreadsheets, documents, archives, etc.)
- Authenticated download URLs with API key
- Automatic duplicate filename handling
- Visible in monitoring dashboard Attachments tab
Learn more in the Browser Downloads documentation.
2025-11-04
N8N Integration
N8N Integration 0.1.7 released.
The Scrapfly nodes for N8N have been updated with new features and improvements.
2025-11-01
Scrapium Browser
Scrapium Browser v142 released. This version match with Chromium 142 and continue to improve antibot detection and scraping capabilities.
Following services are automatically upgraded:
- Cloud Browser Service
- Web Scraping API
- Screenshot API
2025-10-15
Scrapium Browser
Scrapium Browser v141 released. This version match with Chromium 141 and continue to improve antibot detection and scraping capabilities.
Following services are automatically upgraded:
- Cloud Browser Service
- Web Scraping API
- Screenshot API
2025-09-21
Dashboard
Multi factor authentication (MFA) is now available for all users to improve the security of your account. It works for individual and team accounts. You can now enable it from your dashboard security settings.
2025-05-20
Scrapium Browser
Scrapium Browser v137 released. This version match with Chromium 137 and continue to improve antibot detection and scraping capabilities.
Following services are automatically upgraded:
- Cloud Browser Service
- Web Scraping API
- Screenshot API
2025-04-29
Python SDK
Python SDK 0.8.22 released.
Update instructions:
- Install specific version:
pip install scrapfly-sdk==0.8.22 - Upgrade to latest:
pip install --upgrade scrapfly-sdk
2025-04-20
Proxy Saver
Proxy Saver is now available in public beta. Proxy saver allows you to cut cost on proxy provider by reducing the bandwidth usage and leverage distributed http cache, reuse connection and more. (Compatible with any proxy provider on the market and private proxies).
2024-12-28
Dashboard
Team feature is now available in the dashboard. You can now invite your team members to collaborate on your projects and configure their access rights.
You can see the documentation, and discover it in your dashboard.
2024-11-09
TypeScript SDK
TypeScript SDK 0.6.9 released.
- change Extraction API parameters names to match API parameter names:
ephemeral_template->extraction_ehphemeral_templatetemplate->extraction_template- old parameter names are still available with a deprecation warning
Update instructions:
- npm:
npm install scrapfly-sdk@0.6.9 - JSR:
jsr add @scrapfly/scrapfly-sdk@0.6.9
2024-06-10
Web Scraping API
Web Scraping API now announce the debug replay url, when you are using the debug parameter in the Web Scraping API,
the response will now contain a content_replay_url in context.debug to replay a scrape against the exact same content.
This URL need to be authenticated with the same API key used to perform the scrape.
context {
debug: {
screenshot_url: "https://api.scrapfly.io/11cd6abe-5061-4dce-8d37-5d50e667a071/scrape/screenshot/ee8484c6-ee5f-4775-a665-0a2b57631c1c/debug",
response_url: "https://api.scrapfly.io/scrape/debug/ee8484c6-ee5f-4775-a665-0a2b57631c1c",
content_replay_url: "https://api.scrapfly.io/scrape/debug/ee8484c6-ee5f-4775-a665-0a2b57631c1c/replay",
}
}
For more information, refer to the Web Scraping API documentation
2024-06-04
Screenshot API
Screenshot API released. This API allows you to take screenshots of web pages, much simpler than the Web Scraping API and all preset pre configured (Image load, High quality, Rendering wait)
Screenshot API provide some unique features:
- Multiple image format (jpg, png, webp, gif)
- Multiple capture mode (custom viewport, fullpage, vertical, elements)
- Custom resolution
- Caching
- Page options (Dark mode, block banners, block ads, print format)
You can now discover the Screenshot API in the API documentation.
Extraction API
Extraction API released in BETA. This API allows you to extract structured data from web pages. It comes with 3 modes of extraction:
- Custom rules with extraction template: define your own extraction rules, formatters, and filters
- LLM Prompt Extraction: Extract or ask question about the document using our pre-trained LLM model dedicated to web scraping
- Automatic extraction: Choose a model of extraction based on the type of page (product, job, article, etc.) and retrieved the structured data and metadata information to evaluate the quality of the extraction
You can now discover the Extraction API in the API documentation.
Web Scraping API
Web Scraping API now integrate data extraction from the scraped pages. You refer the documentation of those new parameters:
extraction_template: Use your own extraction rulesextraction_prompt: Use LLM prompt to retrieve dataextraction_model: Use automatic extraction mode
Fixed an issue where the Web Scraping API screenshot return the image with an invalid IANA content type
image/jpg instead of image/jpeg
The proxified_response parameter, when using extraction_template or extraction_prompt or extraction_model,
now return the content-type of the extracted data instead of the original response content-type.
More information about the proxified_response parameter in the Web Scraping API documentation
The format parameter now accept options to configure the output format of the scraped page.
Markdown format now allow to:
- Disable images
no_linksand use the alt text instead - Disable links
no_imagesand use the anchor instead
By using the following notation: markdown:no_links,no_images - {format}:{option1},{optionN}
To lean more about those formats, refer to the Web Scraping API documentation
2024-04-24
Python SDK
Python SDK 0.8.17 released. This version introduce the support of:
- Web Scraping API
formatparameter - Web Scraping API
screenshot_flagsparameter
You can now install the new version with pip install scrapfly-sdk==0.8.17 or upgrade with pip install --upgrade scrapfly-sdk
Javascript SDK
Javascript SDK 0.5.0 released. This version introduce the support of:
- Web Scraping API
formatparameter - Web Scraping API
screenshot_flagsparameter
You can now install the new version with npm install scrapfly-sdk@0.5.0 or upgrade with npm install scrapfly-sdk@latest
2024-04-22
Python SDK
Scrapfly has now official integration with LlamaIndex to help you to extract data.
Scrapfly has now official integration with LangChain to help you to extract data.
Web Scraping API
Introduce a new parameter format to the Web Scraping API to allow you to convert the scraped page to a specific format.
With the rise of LLM usage, you can now convert into friendly LLM format and more.
You can now convert the scraped page to:
markdowntextjson(auto parse)clean_html
If you are using proxified_response to directly retrieve the content, the announced content-type will
follow the format you choose.
To lean more about those formats, refer to the Web Scraping API documentation
You can now pass flags to configure screenshot options directly from the Web Scraping API.
Available flags:
load_imagesdark_modeblock_bannershigh_qualityprint_media_format
To lean more about those flags, refer to the Web Scraping API documentation