Customize Requests
All Scrapfly HTTP requests can be customized with custom headers, methods, cookies and other HTTP parameters. Let's take a look at the available options.
Method
The scrape requests method is equivalent to the HTTP method used to call the API. For example, calling Scrapfly through POST
will forward the request as a POST
request to the upstream website.
Available methods are: GET
, PUT
, POST
, PATCH
, HEAD
GET
request is the most common request type used in web scraping.
GET
requests are used for retrieving data from a server without providing any data in the body of the request.
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fhtml"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?key=&url=https%253A%252F%252Fhttpbin.dev%252Fhtml
POST
requests are most commonly used to submit forms or documents. This HTTP method usually requires a body
parameter to be sent with the request which stores the posted data.
To indicate the type of posted data the content-type
header is used and if it is not set explicitly, it'll default to
application/x-www-form-urlencoded
which stands for urlencoded data.
Another popular alternative is JSON
and for posting JSON
data, the content-type
header has to be specified as application/json
.
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fpost"
payload = "{\"example\":\"value\"}"response = requests.request("POST", url, data=payload)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?key=&url=https%253A%252F%252Fhttpbin.dev%252Fpost
And here's a full example with for posting JSON and configuring the content-type header:
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fpost&headers[content-type]=application%2Fjson"
payload = "{\"example\":\"value\"}"response = requests.request("POST", url, data=payload)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?key=&url=https%253A%252F%252Fhttpbin.dev%252Fpost&headers%5Bcontent-type%5D=application%252Fjson
PUT
requests are used to submit forms and upload user-created content.
When using this method, if content-type
header is not set explicitly, it'll default to
application/x-www-form-urlencoded
as we assume you send urlencoded data.
For putting JSON
data, specify content-type: application/json
header.
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fput&headers[content-type]=application%2Fjson"
payload = "{\"example\":\"value\"}"response = requests.request("PUT", url, data=payload)
data = response.json()
print(data)
print(data['result'])
PATCH
requests are used to submit forms and update user-created content.
When using this method, if content-type
header is not set explicitly, it'll default to
application/x-www-form-urlencoded
as we assume you send urlencoded data.
For patching JSON
data, specify content-type: application/json
header.
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fput&headers[content-type]=application%2Fjson"
payload = "{\"example\":\"value\"}"response = requests.request("PATCH", url, data=payload)
data = response.json()
print(data)
print(data['result'])
HEAD
requests are used to retrieve page metadata like response headers and status codes
without fetching the content. When HEAD
method is used, headers of the upstream website are directly forwarded
to the API response. This means that Scrapfly response headers match the headers of the scraped website.
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fhead"
response = requests.request("HEAD", url)
data = response.json()
print(data)
print(data['result'])
Headers
Request headers sent by Scrapfly can be customized through the headers parameter. Note that the value of headers must be urlencoded to prevent any side effects. When in doubt, use Scrapfly's url encoding web tool.
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fheaders&headers[foo]=bar"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?key=&url=https%253A%252F%252Fhttpbin.dev%252Fheaders&headers%5Bfoo%5D=bar
headers[X-foo][0]=bar&headers[X-foo][1]=baz
and the order order and structure will be replicated
By default, Scrapfly get you covered and handle automatically default headers to replicate a browser. You learn more about headers in this dedicated article
When Anti Scraping Protection is enabled, headers are fine-tuned on target you scrape is also done at header level to maximize your success rate
Important headers to keep in mind in web scraping context:
Content-Type
Specifies the media type of the resource being sent in the HTTP message body. It tells the recipient what kind of data to expect and how to interpret it. The Content-Type header is typically used in HTTP requests and responses, particularly in responses from servers to clients.
Example, sending a POST
request with a body of JSON
data, you must specify application/json
.
By default, if you send a POST request without Content-Type
header, application/x-www-form-urlencoded
will be set.
If this Header is not correctly configured, the target website respond with a 400, 406
or block you
Accept
Indicates the media types that the client is willing to receive in the response. Helps servers determine the appropriate representation of the requested resource.
By default, we set what the our browser would sent
Example for a JSON API expecting a response in JSON:
Accept: application/json
If this Header is not correctly configured, the target website respond with a 400
or block you
Referer
Indicates the URL of the web page from which the request originated. Often used by servers to track the source of incoming requests.
By default, this header is not sent if not specified
Example:
Referer: https://www.example.com/page1.html
This header can be mutated while using Anti-Scraping Protection feature
Behavior And Interaction With Other Features
When the ASP is activated or a specific os is set, following headers become immutable or limited
user-agent
: If you set a custom chrome user agent, it will be ignored to keep our actual versionsec-ch-ua
: Ignoredsec-ch-ua-arch
: Ignoredsec-ch-ua-platform
: Ignoredsec-ch-ua-platform-version
: Ignoredsec-ch-ua-full-version
: Ignoredsec-ch-ua-bitness
: Ignored
With ASP activated, referer header is auto handled if no header set. However on specific target you might want to disable this. Setting referer
header to none
will prevent it
Cookies
Cookies are regaular HTTP headers and shouldn't be treated in a special way. While most HTTP clients and libraries have a dedicated API to manage cookies, to manage cookies with Scrapfly API simply set the appropriate headers.
Set-Cookie
This header should never be sent from the client’s side. It's a response header sent when upstream wants to register a cookie with the client.
Cookie
This header contains the cookie values held by the client. So, when scraping this should be used to include cookie data.
The Cookie
header contains key-to-value pairs of data separated by semicolons. For example:
- Single cookie:
Cookie: test=1
- Multiple cookie:
Cookie: test=1;lang=fr;currency=USD
You can also pass multipleCookie
headers to send multiple time the cookie headers Example:headers[Cookie][0]=foo%3Dbar&headers[Cookie][1]=bar%3Dbaz
Note:%3D
is the urlencoded version of=
, do not forget to urlencode the header value to not conflict with the actual url structure. Otherwise inside cookie value=
would be interpreted as query params of the url.
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies&headers[cookie]=lang%3Dfr%3Bcurrency%3DUSD%3Btest%3D1"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?key=&url=https%253A%252F%252Fhttpbin.dev%252Fcookies&headers%5Bcookie%5D=lang%253Dfr%253Bcurrency%253DUSD%253Btest%253D1
Geo Targeting
Each Scrapfly request can be sent from a specific country. This is called Geo-Targetting and is managed by Scrapfly's proxy network.
The desired country can be specified 2-letter country codes (ISO 3166-1 alpha-2). Available countries are defined on the proxy pool dashboard. If the country is not available in the Public Pool a personal private pool can be created with desired countries. Note that restricting countries also restricts the available proxy IP pool.
To specify geo targetting the country parameter can be used:
- Single country selection:
country=us
- Multi country selection with random selection:
country=us,ca,mx
- Multi country selection with weighted random selection (higher weights have higher probability):
country=us:1,ca:5,mx:3
- Country exclusion:
country=-gb
For example, to send request through the United States the country=us
would be used:
import requests
url = "https://api.scrapfly.io/scrape?country=us&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?country=us&key=&url=https%253A%252F%252Fhttpbin.dev%252Fanything
For more on proxies, see the proxy documentation page
For spoofing the latitude and longitude of web browser's location services, the geolocation
parameter can be used, for example: geolocation=48.856614,2.3522219
(latitude, longitude):
import requests
url = "https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?key=&url=https%253A%252F%252Fhttpbin.dev%252Fanything
The available country options depend on the selected proxy_pool. See this table for available options for your account:
Language
Content language can be configured through the lang parameter. By default the language is inferred from the proxy location. So, if proxy of France is used the scrape request will be configured with french language preferences.
Behind the scenes, this is done by configuring the Accept-Language
HTTP header.
If the website supports this header and the requested language, the content will be returned in that language.
Multiple language options can be passed as well by providing multiple comma-separated values. Country locale is also supported
in {lang iso2}-{country iso2}
format.
Note that the order matters as the website will negotiate the content language based on this order.
For example, lang=fr,en-US,en
will result in final header Accept-Language: fr-{proxy country iso2},fr;q=0.9,en-US;q=0.8,en;q=0.7
Most users prefer English regardless of the proxy location. For that, use lang=en-US,en
import requests
url = "https://api.scrapfly.io/scrape?lang=en-us%2Cen&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?lang=en-us%252Cen&key=&url=https%253A%252F%252Fhttpbin.dev%252Fanything
Operating System
We do not recommend using this feature unless it's absolutely necessary as it can impact scraper blocking rates.
By default, Scrapfly automatically selects the most suitable Operating System for all outgoing requests. To configure operating system explicitly the os parameter can be used.
The supported values are: win,win10,win11,mac,linux,chromeos
Because of potential conflicts, theos
parameter andUser-Agent
header cannot be set at the same time.
For example, to set Operating System to Windows 11 the os=win11
parameter would be used:
import requests
url = "https://api.scrapfly.io/scrape?os=win11&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything"
response = requests.request("GET", url)
data = response.json()
print(data)
print(data['result'])
https://api.scrapfly.io/scrape?os=win11&key=&url=https%253A%252F%252Fhttpbin.dev%252Fanything