Python SDK

Python SDK gives you a handy abstraction to interact with Scrapfly API. It includes all of scrapfly features and many convenient shortcuts:

  • Automatic base64 encode of JS snippet
  • Error Handling
  • Body json encode if Content-Type: application/json
  • Body URL encode and set Content Type: application/x-www-form-urlencoded if no content type specified
  • Convert Binary response into a python ByteIO object

Step by Step Introduction

For a hands-on introduction see our Scrapfly SDK introduction page!

Discover Now

The Full python API specification is available here: https://scrapfly.github.io/python-scrapfly/docs/scrapfly

For more on Python SDK use with Scrapfly, select "Python SDK" option in Scrapfly docs top bar.

Installation

Source code of Python SDK is available on Github scrapfly-sdk package is available through PyPi.

You can also install extra package scrapfly[speedups] to get brotli compression and msgpack serialization benefits.

Scrape

If you plan to scrape protected website - make sure to enable Anti Scraping Protection

Discover python full specification:

Using Context

How to configure Scrape Query

You can check the ScrapeConfig implementation to check all available options available here.

All parameters listed in this documentation can be used when you construct the scrape config object.

Download Binary Response

Error Handling

Error handling is a big part of scraper, so we design a system to reflect what happened when it's going bad to handle it properly from Scraper. Here a simple snippet to handle errors on your owns

Errors with related code and explanation are documented and available here, if you want to know more.

By default, if the upstream website that you scrape responds with bad HTTP code, the SDK will raise UpstreamHttpClientError or UpstreamHttpServerError regarding the HTTP status code. You can disable this behavior by setting the raise_on_upstream_error attribute to false. ScrapeConfig(raise_on_upstream_error=False)

If you want to report to your app for monitoring / tracking purpose on your side, checkout reporter feature.

Account

You can retrieve account information

Keep Alive HTTP Session

Take benefits of Keep-Alive Connection

Concurrency out of the box

You can run scrape concurrently out of the box. We use asyncio for that.

In python, there are many ways to achieve concurrency. You can also check:

First of all, ensure you have installed concurrency module


Summary