Best Proxy Providers for Web Scraping

article feature image

One of the key challenges in web scraping is content accessibility and this can be solved by using proxies to disguise web scrapers identity. Proxy services can be used to either change the incoming geographic location or disguise a single connection source to appear as many.

In this article we'll take a look at and compare few popular proxy providers. We'll also briefly cover how to pick the right provider for your web scraper and what are some common challenges and issues to look out for.

Introduction To Proxies in Web Scraping

For more on what proxy usage in web scraping see our extensive introduction article, which covers proxy protocols, types and common challenges.

Introduction To Proxies in Web Scraping

Quality Evaluation

Not all proxies are made equal even if they are defined by the same proxy type (be it datacenter, residential or mobile). There are few key points that are worth keeping an eye on when evaluating proxy quality for web scraping besides raw tests.

First thing to note is proxy user pool sharing. Private proxies will yield much better results compared to shared proxy pools, which often have several users using same IPs for same targets. If you think your target is a popular web scraping target then shared pools should be avoided.

Another thing to note is geographic location of proxies. US based proxies tend to have the best quality rating when it comes to web scraper blocking. So while some services can claim to have thousands of proxies in their proxy pool most of them might be from low-quality regions that have lesser success rates.

For peer-to-peer rotating residential and mobile proxies a common issue is that received proxies are not always residential/mobile proxies. In our experience this can vary from 1-40%, so it's important to confirm IP type (for example see "Connection type" in ipleak.com results) before using it in your web-scraper for optimal results.

Concurrency limit (aka thread limit) can frequently be a common source of stability issues. Fast web scrapers can reach this limit pretty quickly as it's often lower than advertised and really hard to measure for. It's something worth keeping and eye on.

Finally, since proxy providers usually offer proxies through a single backconnect proxy (server that distributes proxies to clients) quality, speed and stability can vary greatly by each implementation. To add, this can make implementing custom, smarter proxy rotation logic more difficult for web scraper developers, which can further reduce chances of successful connections.

Pricing Evaluation

Proxy services offer very different pricing options. Some charge by proxy count, some by bandwidth usage and some by combination of both.

Generally speaking, in web scraping bandwidth proxies can grow the bill really quickly and should be avoided if possible. Let's take a look at some usage scenarios and how bandwidth proxies would scale:

target avg document page size pages per 1GB avg browser page size pages per 1GB
Walmart.com 16kb 1k - 60k 1 - 4 MB 200 - 2,000
Indeed.com 20kb 1k - 50k 0.5 - 1 MB 1,000 - 2,000
LinkedIn.com 35kb 300 - 30k 1 - 2 MB 500 - 1,000
Airbnb.com 35kb 30k 0.5 - 4 MB 250 - 2,000
Target.com 50kb 20k 0.5 - 1 MB 1,000 - 2,000
Crunchbase.com 50kb 20k 0.5 - 1 MB 1,000 - 2,000
G2.com 100kb 10k 1 - 2 MB 500 - 2,000
Amazon.com 200kb 5k 2 - 4 MB 250 - 500

In the table above, we see example average single target page sizes of some popular web scraping targets.
Bandwidth use by web scrapers varies wildly based on scraped target and web scraping technique. For example, reverse engineering websites behavior and grabbing only the data document details will use significantly less bandwidth than using automated browser solutions like Puppeteer, Selenium or Playwright. So, for browser based scraping bandwidth based pricing is completely inaccessible.

Finally, all estimations should be at least doubled to consider retry logic and other usage overhead (like session warm up, and request headers). Let's say we have a $400/Mo plan that gives us 20GB of data. That would only net us ~50k Amazon product scrapes at best and only few hundred if we use a web browser with no special caching or optimization techniques.

On the opposite end, bandwidth proxies can work well with web scrapers that take advantage of AJAX/XHR requests. For example, the same $400/Mo plan of 20GB data would yield us ~600k walmart.com product scrapes if we can reverse engineer walmart's web page behavior, which is a much more reasonable proposition!

Bandwidth based proxies usually give access to big proxy pools, but it's very rare for web scrapers to need more than 100-1000 proxies per projects. For example, if we use 1 proxy at 30req/minute to scrape a website at 5000req/minute we only require 167 rotating proxies!

Proxy count based pricing is often a much safer and easier pricing model to work with. Buying a starter pool of private proxies (only accessible to a single client or very small pool of clients) is an easier and safer commitment for web scraping projects.

Evaluation Methodology

In this article we'll be evaluating proxy providers from the point of view of ScrapFly's very own web scraping proxy-like service. We'll cover the most important features used in web scraping, so our full evaluation table will look like this:

Feature Example Service
Datacenter Proxies
Residential Proxies
Mobile Proxies
Geo Targeting
Anti Bot Bypass
Javascript Rendering
Log Monitoring
Price per GB $1-25
50GB Project Estimated cost $350/Mo

Here we're evaluating proxy types: datacenter, residential and mobile, proxy features such as geo targeting and anti bot bypass and some analytical examples like price per gigabyte of bandwidth and estimated cost of an average 50GB web scraper.

Since, we're evaluating from point of view of ScrapFly user let's take a look at what makes ScrapFly so special!

ScrapFly

At ScrapFly we realize how complicated proxies are in web scraping, so we made it our goal to simplify the process while also keeping the service accessible.

scrapfly middleware

ScrapFly feels like a proxy but does much more!

ScrapFly offers a request middleware service, which ensures that outgoing requests result in successful responses. This is done by a combination of unique ScrapFly features such as a smart proxy selection algorithm, anti web scraping protection solver and browser based rendering.

ScrapFly is using credit based pricing model, which is much easier to predict and scale than bandwidth/proxy count based pricing. This allows flexible pricing based on used features rather than arbitrary measurements such as bandwidth, meaning our users aren't locked in to a single solution and can adjust their scrapers on the fly!

image of scrapfly's pricing tiers

For example, the most popular $100/Mo tier can yield up to 1,000,000 target responses based on enabled features:

  • ScrapFly provides a choice of either datacenter or residential proxies and geolocation (over 50+ locations) for each request.
  • All ScrapFly HTTP1 requests are automatically converted to HTTP2 requests, which are significantly less likely to be blocked.
  • ScrapFly offers smart Anti Scraping Protection solution, which solves various captchas and scraping protection blockers if they do appear during the scraping process. What's great about ASP service is that the user only charged 5 credits for successful solutions, meaning this can be applied to every request worry free!
  • ScrapFly offers browser based rendering, which even further reduces chances of being blocked as real web browsers are much less likely to be blocked than HTTP clients. Using browser based rendering also greatly simplifies web scraping process as it reduces engineering efforts needed to understand scrape website - your requests will return the same data users see in their web browsers!

To explore these and other offered features see our full documentation!


Let's see how ScrapFly would look on our evaluation table:

Feature ScrapFly
Datacenter Proxies
Residential Proxies
Mobile Proxies
Geo Targeting 54 countries
Anti Bot Bypass
Javascript Rendering
Log Monitoring
Price per GB per request
50GB Project Estimated cost $100/Mo

Webshare

Webshare.io is one of the biggest general proxy providers. They offer variety of proxy types: datacenter, private datacenter, residential and ISP (static residential) proxies and loads of bandwidth (starting at 250GB/month to unlimited).

logo of webshare.io

Most interesting thing about Webshare is their generous Bandwidth allowance - starting 250GB/Mo to Unlimited. However, this does mean pricing is proxy-count based which requires extra diligence in web scraper implementation.

Since Webshare provides loads of bandwidth our project price evaluation is mostly based on proxy type and quality. However, an average 50GB web scraper won't do well enough with datacenter proxies and that's one of the biggest weaknesses of Webshare proxies - they're mostly good for smaller, low-risk web scraping projects.

Let's see how this would look on our evaluation table:

Feature Webshare
Datacenter Proxies
Residential Proxies
Mobile Proxies
Geo Targeting 1-25 countries
Anti Bot Bypass
Javascript Rendering
Log Monitoring
Price per GB $1-25
50GB Project Estimated cost $250

Netnut

Netnut.io is another big proxy provider, which offers bandwidth based proxy plans for datacenter, residential and ISP proxies.

logo of netnut.io

There are some good and bad things to say about Netnut. It's surprisingly accessible with low entry point of $20 for 20GB/Mo and offers unlimited* concurrency from over 150 countries which is great for high connectivity scrapers. However, bandwidth based pricing can get expensive quite quickly and there are noticeable issues with residential proxy quality.

Let's see how Netnut looks on our evaluation table:

Feature Netnut
Datacenter Proxies (50k)
Residential Proxies (10-20M)
Mobile Proxies
Geo Targeting 150 countries
Anti Bot Bypass
Javascript Rendering
Log Monitoring
Price per GB $1-17.5
Small (1-2GB) Project Estimated Cost $20
Med (10-15GB) Project Estimated cost $20-300
Big (100-150GB) Project Estimated cost $800 - 2,000

For our 50GB average web scraping project we need residential proxies which are significantly more expensive for the bandwidth we need.

Soax

Soax.com is another big name in the proxy world. Just like Netnut this service uses bandwidth based tier pricing offering residential and mobile proxies.

logo of soax.com

Soax seems to offer premium proxy experience so let's take a look how it appears on our evaluation table:

Feature Soax
Datacenter Proxies
Residential Proxies (5M)
Mobile Proxies
Geo Targeting 100 countries
Anti Bot Bypass
Javascript Rendering
Log Monitoring
Price per GB $12 - 33
Small (1-2GB) Project Estimated Cost $99
Med (10-15GB) Project Estimated cost $300
Big (100-150GB) Project Estimated cost $1050+

As you can see, Soax pricing model is quite simple, however it doesn't leave us with a lot of space for customization and bandwidth based pricing is quite painful. Soax is working on providing datacenter proxies in the future which might alleviate some of the pricing pains.

Geosurf

Geosurf.com is another bandwidth tier based residential proxy provider that has been in the proxy industry for over 10 years.

logo of geosurf.com

It's a very similar offering to that of Soax.com, however it seems to be aimed more at enterprise level of users with higher minimum commitment but slightly better value. Let's see how it looks on our evaluation table:

Feature Geosurf
Datacenter Proxies
Residential Proxies (2.5M)
Mobile Proxies
Geo Targeting 135 countries + 1700 cities
Anti Bot Bypass
Javascript Rendering
Log Monitoring
Price per GB $8 - 12
Small (1-2GB) Project Estimated Cost $450
Med (10-15GB) Project Estimated cost $450
Big (100-150GB) Project Estimated cost $1500

Unfortunately, Geosurf suffers from similar issues Soax.com does making it a difficult choice for low and mid tier projects. However, Geosurf does offer unlimited* concurrency and proxy selection by city which can come in handy for some niche web scrapers.

Summary

Feature ScrapFly Webshare Netnut Soax Geosurf
Datacenter Proxies 3.4M on demand 50k shared
Residential Proxies 190M on demand 10-20M 5M 2.5M
Mobile Proxies 7M on demand 3.5M
Geo Targeting (Countries) 54 1-25 150 100 135
Anti Bot Bypass
Javascript Rendering
Log Monitoring
Price per GB per request $1-25 $1-17.5 $12 - 33 $8 - 12
Minimum Commitment (Monthly) $15 $15 $20 $99 $450
50GB Project Estimated cost $100 $250 $800 $700 $900

When it comes to web scraping a classic proxy service is a tough sell. Even with the recent advances in proxy quality these services still fall short compared to dedicated web scraping APIs which can apply additional, smart connection strategies to prevent captchas, blocking or throttling.

ScrapFly's combination of smart connection strategies and extra UX features like Javascript Rendering and Anti Bot Bypass can make even the hardest targets easily accessible while also simplifying web scraping process!

Related post

Web Scraping With Node-Unblocker

Tutorial on using Node-Unblocker - a nodejs library - to avoid blocking while web scraping and using it to optimize web scraping stacks.

How to Avoid Web Scraping Blocking: IP Address Guide

How IP addresses are used in web scraping blocking. Understanding IP metadata and fingerprinting techniques to avoid web scraper blocks.

Top 5 Residential Proxy Providers for Web Scraping

Analysis and comparison of top residential proxy providers. What to look for in residential proxies for web scraping?

Top 4 Mobile Proxy Providers for Web Scraping

Analysis and comparison of top mobile proxy providers. What to look for in mobile proxies for web scraping?