Everything to Know to Start Web Scraping in Python Today
Complete introduction to web scraping using Python: http, parsing, AI, scaling and deployment.
Python's httpx HTTP client package supports both HTTP and SOCKS5 proxies. Here's how to use proxies with httpx:
import httpx
from urllib.parse import quote
# proxy pattern is:
# scheme://username:password@IP:PORT
# For example:
# no auth HTTP proxy:
my_proxy = "http://160.11.12.13:1020"
# or socks5
my_proxy = "http://160.11.12.13:1020|socks5"
# proxy with authentication
my_proxy = "http://my_username:my_password@160.11.12.13:1020"
# note: that username and password should be url quoted if they contain URL sensitive characters like "@":
my_proxy = f"http://{quote('foo@bar.com')}:{quote('password@123')}@160.11.12.13:1020"
proxies = {
# this proxy will be applied to all http:// urls
'http://': 'http://160.11.12.13:1020',
# this proxy will be applied to all https:// urls (not the S)
'https://': 'http://160.11.12.13:1020',
# we can also use proxy only for specific pages
'https://httpbin.dev': 'http://160.11.12.13:1020',
}
with httpx.Client(proxies=proxies) as client:
r = client.get("https://httpbin.dev/ip")
# or async
async with httpx.AsyncClient(proxies=proxies) as client:
r = await client.get("https://httpbin.dev/ip")
Note that proxy can also be set through the standard *_PROXY
environment variables:
$ export HTTP_PROXY="http://160.11.12.13:1020"
$ export HTTPS_PROXY="http://160.11.12.13:1020"
$ export ALL_PROXY="socks://160.11.12.13:1020"
$ python
import httpx
# this will use the proxies we set
with httpx.Client() as client:
r = client.get("https://httpbin.dev/ip")
When web scraping, it's best to rotate proxies for each request. For that see our article: How to Rotate Proxies in Web Scraping
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇