Webhook
Scrapfly's webhook feature is ideal for managing crawler jobs asynchronously.
When webhook is specified through the webhook_name parameter, Scrapfly will notify your HTTP endpoint about crawl events in real-time,
eliminating the need for polling.
To start using webhooks, first one must be created using the webhook web interface.
The webhook will be called for each event you subscribe to during the crawl lifecycle. For reconciliation, you will receive the crawler_uuid and webhook_uuid in the response headers.
Webhook Queue SizeThe webhook queue size indicates the maximum number of queued webhooks that can be scheduled. After the crawler event is processed and your application is notified, the queue size is reduced. This allows you to schedule additional crawler jobs beyond the concurrency limit of your subscription. The scheduler will handle this and ensure that your concurrency limit is met.
FREE
$0.00/moDISCOVERY
$30.00/moPRO
$100.00/moSTARTUP
$250.00/moENTERPRISE
$500.00/mo500 500 2,000 5,000 10,000
Scope
Webhooks are scoped per Scrapfly projects and environments. Make sure to create a webhook for each of your projects and environments (test/live).
Usage
Webhooks can be used for multiple purposes. In the context of the Crawler API, to ensure you received a crawler event, you must check the headerX-Scrapfly-Webhook-Resource-Typeand verify the value iscrawler.
To enable webhook callbacks, specify the webhook_name parameter in your crawler requests and optionally provide a list of webhook_events you want to be notified about.
Scrapfly will then call your webhook endpoint as crawl events occur.
Note that your webhook endpoint must respond with a 2xx status code for the webhook to be considered successful.
The 3xx redirect responses will be followed, and response codes 4xx and 5xx are considered failures and will be retried as per the retry policy.
The below examples assume you have a webhook named my-crawler-webhook registered. You can create webhooks via the web dashboard.
Webhook Events & Payloads
The Crawler API supports multiple webhook events that notify you about different stages of the crawl lifecycle. Each event sends a JSON payload with the crawler state and event-specific data.
Default SubscriptionIf you don't specify
webhook_events, you'll receive:crawler_started,crawler_stopped,crawler_cancelled, andcrawler_finished.
HTTP Headers
Every webhook request includes these HTTP headers for easy routing and verification:
| Header | Purpose | Example Value |
|---|---|---|
X-Scrapfly-Crawl-Event-Name |
Fast routing - Use this to route events without parsing JSON | crawler_started |
X-Scrapfly-Webhook-Resource-Type |
Resource type (always crawler for crawler webhooks) |
crawler |
X-Scrapfly-Webhook-Job-Id |
Crawler UUID for tracking and reconciliation | 550e8400-e29b... |
X-Scrapfly-Webhook-Signature |
HMAC-SHA256 signature for verification | a3f2b1c... |
Route webhook events using the X-Scrapfly-Crawl-Event-Name header instead of parsing the JSON body.
This is significantly faster for high-frequency events like crawler_url_visited.
Event Types & Examples
Click each tab below to see the event description and full JSON payload example:
crawler_started
When: Crawler execution begins
Use case: Track when crawls start, log crawler UUID, initialize tracking systems
Frequency: Once per crawl
crawler_uuid, seed_url, links.status
crawler_url_visited
When: Each URL is successfully crawled
Use case: Real-time progress tracking, streaming results, monitoring performance
Frequency: High - Fires for every successfully crawled URL (can be thousands per crawl)
X-Scrapfly-Crawl-Event-Name header for fast routing without parsing JSON body.
crawler_url_failed
When: A URL fails to crawl (network error, timeout, block, etc.)
Use case: Error monitoring, retry logic, debugging failed scrapes
Frequency: Per failed URL
error- Error code for classificationlinks.log- Direct link to scrape log for debuggingscrape_config- Complete configuration to replay the scrapelinks.scrape- Ready-to-use retry URL with same configuration
crawler_url_skipped
When: URLs are skipped (already visited, filtered, depth limit, etc.)
Use case: Monitor filtering effectiveness, track duplicate discovery
Frequency: Per batch of skipped URLs
urls contains a map of each skipped URL to its skip reason
crawler_url_discovered
When: New URLs are discovered from crawled pages
Use case: Track crawl expansion, monitor discovery patterns, sitemap building
Frequency: High - Fires for each batch of discovered URLs
origin (source URL where links were found), discovered_urls (list of new URLs)
crawler_finished
When: Crawler completes successfully (at least one URL visited)
Use case: Trigger post-processing, download results, send completion notifications
Frequency: Once per successful crawl
state.urls_visited > 0 confirms at least one URL was crawled. Check state.stop_reason to understand why the crawler completed (e.g., no_more_urls, page_limit).
crawler_stopped
When: Crawler stops due to failure (seed URL failed, errors, no URLs visited)
Use case: Error alerting, failure logging, retry automation
Frequency: Once per failed crawl
state.stop_reason for the exact cause:
seed_url_failed- Initial URL couldn't be crawledcrawler_error- Internal crawler error occurredno_api_credit_left- Account ran out of API credits mid-crawlmax_api_credit- Configured credit limit reached
crawler_cancelled
When: User manually cancels the crawl via API or dashboard
Use case: Update tracking systems, release resources, log cancellations
Frequency: Once per user cancellation
state.stop_reason will be user_cancelled. Partial crawl results are available via the status endpoint and can be retrieved normally.
Development
Useful tools for local webhook development:
- https://webhook.site - Collect and display webhook notifications
- https://ngrok.com - Expose your local application through a secured tunnel to the internet
Security
Webhooks are signed using HMAC (Hash-based Message Authentication Code) with the SHA-256 algorithm to ensure the integrity of the webhook content and verify its authenticity. This mechanism helps prevent tampering and ensures that webhook payloads are from trusted sources.
HMAC Overview
HMAC is a cryptographic technique that combines a secret key with a hash function (in this case, SHA-256) to produce a fixed-size hash value known as the HMAC digest. This digest is unique to both the original message and the secret key, providing a secure way to verify the integrity and authenticity of the message.
Signature in HTTP Header
When Scrapfly sends a webhook notification, it includes an HMAC signature in the X-Scrapfly-Webhook-Signature HTTP header.
This signature is generated by applying the HMAC-SHA256 algorithm to the entire request body using your webhook's secret key (configured in the webhook settings).
Verification Example
To verify the authenticity of a webhook notification, compute the HMAC-SHA256 signature of the request body using your secret key
and compare it with the signature provided in the X-Scrapfly-Webhook-Signature header:
Security Best Practices
- Always verify the HMAC signature before processing webhook payloads
- Keep your webhook secret key confidential and rotate it periodically
- Use HTTPS endpoints for webhook URLs to encrypt data in transit
- Implement rate limiting on your webhook endpoint to handle high-frequency events
Next Steps
- Create your first webhook in the webhook dashboard
- Learn about crawler configuration options
- Review error handling for webhook failures