Python SDK
Python SDK gives you a handy abstraction to interact with Scrapfly API. It includes all of scrapfly features and many convenient shortcuts:
- Automatic base64 encode of JS snippet
- Error Handling
- Body json encode if
Content-Type: application/json
- Body URL encode and set
Content Type: application/x-www-form-urlencoded
if no content type specified - Convert Binary response into a python
ByteIO
object
Step by Step Introduction
For a hands-on introduction see our Scrapfly SDK introduction page!
Discover NowThe Full python API specification is available here: https://scrapfly.github.io/python-scrapfly/docs/scrapfly
For more on Python SDK use with Scrapfly, select "Python SDK" option in Scrapfly docs top bar.
Installation
Source code of Python SDK is available on Github scrapfly-sdk package is available through PyPi.
You can also install extra package scrapfly[speedups]
to get
brotli compression and msgpack serialization benefits.
Scrape
If you plan to scrape protected website - make sure to enable Anti Scraping Protection
Discover python full specification:
- Client : https://scrapfly.github.io/python-scrapfly/scrapfly/client.html
- ScrapeConfig : https://scrapfly.github.io/python-scrapfly/scrapfly/scrape_config.html
- API response : https://scrapfly.github.io/python-scrapfly/scrapfly/api_response.html
Using Context
How to configure Scrape Query
You can check the ScrapeConfig
implementation to check all available options
available here.
All parameters listed in this documentation can be used when you construct the scrape config object.
Download Binary Response
Error Handling
Error handling is a big part of scraper, so we design a system to reflect what happened when it's going bad to handle it properly from Scraper. Here a simple snippet to handle errors on your owns
Errors with related code and explanation are documented and available here, if you want to know more.
- scrapfly.UpstreamHttpClientError Upstream website that you scrape response with http code >= 300 < 400
- scrapfly.UpstreamHttpServerError Upstream website that you scrape response with http code >= 500 < 600
- scrapfly.ApiHttpClientError Scrapfly API respond with >= 300 < 400
- scrapfly.ApiHttpServerError Scrapfly API respond with >= 500 < 600
- scrapfly.ScrapflyProxyError Error related to Proxy
- scrapfly.ScrapflyThrottleError Error related to Throttle
- scrapfly.ScrapflyAspError Error related to ASP
- scrapfly.ScrapflyScheduleError Error related to Schedule
- scrapfly.ScrapflyWebhookError Error related to Webhook
- scrapfly.ScrapflySessionError Error related to Session
- scrapfly.TooManyConcurrentRequest Maximum of concurrent request allowed by your plan reached
- scrapfly.QuotaLimitReached Quota Limit of your plan or project reached
By default, if the upstream website that you scrape responds with bad HTTP code, the SDK will raise
UpstreamHttpClientError
or UpstreamHttpServerError
regarding the HTTP status code.
You can disable this behavior by setting the raise_on_upstream_error attribute to false. ScrapeConfig(raise_on_upstream_error=False)
If you want to report to your app for monitoring / tracking purpose on your side, checkout reporter feature.
Account
You can retrieve account information
Keep Alive HTTP Session
Take benefits of Keep-Alive
Connection
Concurrency out of the box
You can run scrape concurrently out of the box. We use asyncio
for that.
In python, there are many ways to achieve concurrency. You can also check:
First of all, ensure you have installed concurrency module