PHP's Guzzle is a popular HTTP client used when web scraping with PHP
and proxies are an integral part of web scraping so here's a quick introduction on how to use proxies with Guzzle:
<?php
require 'vendor/autoload.php';
use GuzzleHttp\Client;
// Proxy pattern is:
// scheme://username:password@IP:PORT
// For example:
// no auth HTTP proxy:
$my_proxy = "http://160.11.12.13:1020";
// proxy with authentication
$my_proxy = "http://my_username:my_password@160.11.12.13:1020";
// Note: that username and password should be url encoded if they contain URL sensitive characters like "@":
$my_proxy = 'http://'.urlencode('foo@bar.com').':'.urlencode('password@123').'@160.11.12.13:1020';
$client = new Client([
// Base URI is used with relative requests
'base_uri' => 'https://httpbin.dev',
// You can set any number of default request options.
'timeout' => 2.0,
'proxy' => [
'http' => $my_proxy, // This proxy will be applied to all 'http' URLs
'https' => $my_proxy, // This proxy will be applied to all 'https' URLs
'https://httpbin.dev' => $my_proxy, // This proxy will be applied only to 'https://httpbin.dev'
]
]);
$response = $client->request('GET', '/ip');
$body = $response->getBody();
print($body);
Guzzle does not support SOCKS proxies and the only available options are php's curl library or buzz.
Note that Guzzle proxy can also be set through the standard *_PROXY
environment variables:
$ export HTTP_PROXY="http://160.11.12.13:1020"
$ export HTTPS_PROXY="http://160.11.12.13:1020"
$ export ALL_PROXY="socks://160.11.12.13:1020"
When web scraping, it's best to rotate proxies for each request. For that see our article: How to Rotate Proxies in Web Scraping