Scrapy - Web Scraping Framework

Scrapfly Logo X Scrapy Python Framework

Introduction

Scrapy is a well known web scraping framework written in python. Massively adopted by community. The integration replace all the network part to rely on our API easily. Scrapy documentation is available here

Scrapy Integration is part of our Python SDK. Source code is available on Github scrapfly-sdk package is available through PyPi.

What's Changed?

Python API is available to get details of objects

Objects

Middlewares

Following middleware are disabled, because they are not relevant when using Scrapfly :

  • scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware
  • scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware
  • scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware
  • scrapy.downloadermiddlewares.useragent.UserAgentMiddleware
  • scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware
  • scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware
  • scrapy.downloadermiddlewares.redirect.RedirectMiddleware
  • scrapy.downloadermiddlewares.cookies.CookiesMiddleware
  • scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware

Internal HTTP / HTTPS downloader are replaced :

  • scrapy.core.downloader.handlers.http11.HTTP11DownloadHandler -> scrapfly.scrapy.downloader.ScrapflyHTTPDownloader

Stats Collector

All Scrapfly metrics are prefix by Scrapfly. Following Scrapfly metrics are avaiable :

  • Scrapfly/api_call_cost - (int) Sum of billed API Credits against your quota
Complete documentation about stats collector is available here: https://docs.scrapy.org/en/latest/topics/stats.html

Settings Configuration

How to use equivalent of API parameters?

You can check out this section of the python SDK to see how to configure your calls

Troubleshooting

Scrapy Checkup

Check API Key setting

tls_process_server_certificate - certificate verify failed

Example: Scrapy Spider Demo

The full example is available in our github repository

Summary