How to Track Web Page Changes with Automated Screenshots
In this tutorial we'll take a look at website change tracking using Python, Playwright and Wand. We'll build a tracking tool and schedule it to send us emails on detected changes.
Python's httpx HTTP client package supports both HTTP and SOCKS5 proxies. Here's how to use proxies with httpx:
import httpx
from urllib.parse import quote
# proxy pattern is:
# scheme://username:password@IP:PORT
# For example:
# no auth HTTP proxy:
my_proxy = "http://160.11.12.13:1020"
# or socks5
my_proxy = "http://160.11.12.13:1020|socks5"
# proxy with authentication
my_proxy = "http://my_username:my_password@160.11.12.13:1020"
# note: that username and password should be url quoted if they contain URL sensitive characters like "@":
my_proxy = f"http://{quote('foo@bar.com')}:{quote('password@123')}@160.11.12.13:1020"
proxies = {
# this proxy will be applied to all http:// urls
'http://': 'http://160.11.12.13:1020',
# this proxy will be applied to all https:// urls (not the S)
'https://': 'http://160.11.12.13:1020',
# we can also use proxy only for specific pages
'https://httpbin.dev': 'http://160.11.12.13:1020',
}
with httpx.Client(proxies=proxies) as client:
r = client.get("https://httpbin.dev/ip")
# or async
async with httpx.AsyncClient(proxies=proxies) as client:
r = await client.get("https://httpbin.dev/ip")
Note that proxy can also be set through the standard *_PROXY
environment variables:
$ export HTTP_PROXY="http://160.11.12.13:1020"
$ export HTTPS_PROXY="http://160.11.12.13:1020"
$ export ALL_PROXY="socks://160.11.12.13:1020"
$ python
import httpx
# this will use the proxies we set
with httpx.Client() as client:
r = client.get("https://httpbin.dev/ip")
When web scraping, it's best to rotate proxies for each request. For that see our article: How to Rotate Proxies in Web Scraping
This knowledgebase is provided by Scrapfly — a web scraping API that allows you to scrape any website without getting blocked and implements a dozens of other web scraping conveniences. Check us out 👇