Webhook
Webhook allow you to make an asynchronous scrape. When calling API with webhook_name=__your_webhook_name__
, your call is in the queue, and as soon as processed by our system, when the scrape is done, we notify you through the defined webhook as you configured it.
You can manage the webhook through the web interface.
You can configure Content-Type
and Content-Encoding
to optimize transfer.
By default no compression and json serialization is used.
The body sent to your endpoint is precisely the same as a regular API scrape response plus webhook information in the context part.
Scope
Webhooks are scoped per project per environment. Don't be angry if the API rejects when you pass webhook_name
, the webhook is not in the same project/environment as your API key.
Retry Policy
If we can't notify the endpoint specified in your webhook settings.
- 30 seconds
- 1 minute
- 5 minutes
- 30 minutes
- 1 hour
- 1 day
If we failed to reach your application more than 100 times in a row, the system automatically disables it, and you will be notified. You can re-enable it from the UI at any moment.
Development
Useful tools to develop locally :
- https://webhook.site Collect and display webhook
- https://ngrok.com Expose you local application through a secured tunnel to the internet
Security
For security reason, you can set define a secret, Scrapfly will reach you with an X-Scrapfly-Webhook-Secret
header containing
the defined secret
Headers
Following headers are added :
X-Scrapfly-Webhook-Env
: Related environment where webhook is triggeredX-Scrapfly-Webhook-Project
: Related project nameX-Scrapfly-Webhook-Secret
Related Secret to authenticate the origin of a callX-Scrapfly-Webhook-Name
Name of the webhook
Usage
You must create and configure webhook named example in order to play this example.
To notify the scrape response on your webhook, you simple need to pass the name of it on our API like in the example below.
- Your webhook must respond
2xx
for success,3xx
(redirections are followed) - Any
4xx
or5xx
will be retried - Webhook notification timeout after
30s
and will be retried
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.org%2Fanything&webhook_name=test"
response = requests.request("GET", url)
print(response.text)
# import json
# print(json.loads(response.text)['result']['content'])
# print(json.loads(response.text)['result']['status_code'])
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.org%2Fanything&webhook_name=test"
"api.scrapfly.io"
"/scrape"
key = ""
url = "https://httpbin.org/anything"
webhook_name = "test"
Example Of Response
{
...
"context": {
...
"webhook": {
"name": "example",
"secret": "1a8c1001-5bc9-4b23-a830-31ac864cd18c",
"consecutive_failed_count": 0
}
...
}
...
}
Related Errors
All related errors are listed below. You can see full description and example of error response on Errors section
- ERR::WEBHOOK::DISABLED - Given webhook is disabled, please check out your webhook configuration for the current project / env
- ERR::WEBHOOK::ENDPOINT_UNREACHABLE - We were not able to contact your endpoint
- ERR::WEBHOOK::MAX_CONCURRENCY_REACHED - You reach the maximum concurrency limit
- ERR::WEBHOOK::MAX_RETRY - Maximum retry exceeded on your webhook
- ERR::WEBHOOK::NOT_FOUND - Unable to find the given webhook for the current project / env
- ERR::WEBHOOK::QUEUE_FULL - You reach the limit of scheduled webhook - You must wait pending webhook are processed
Pricing
No additional fee applied on usage.