Customize Requests

All Scrapfly HTTP requests can be customized with custom headers, methods, cookies and other HTTP parameters. Let's take a look at the available options.

Method

The scrape requests method is equivalent to the HTTP method used to call the API. For example, calling Scrapfly through POST will forward the request as a POST request to the upstream website.
Available methods are: GET, PUT, POST, PATCH, HEAD

GET request is the most common request type used in web scraping. GET requests are used for retrieving data from a server without providing any data in the body of the request.

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fhtml")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.dev%2Fhtml"

"api.scrapfly.io"
"/scrape"

key  = "" 
url  = "https://httpbin.dev/html" 

POST requests are most commonly used to submit forms or documents. This HTTP method usually requires a body parameter to be sent with the request which stores the posted data. To indicate the type of posted data the content-type header is used and if it is not set explicitly, it'll default to application/x-www-form-urlencoded which stands for urlencoded data. Another popular alternative is JSON and for posting JSON data, the content-type header has to be specified as application/json.

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fpost")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Post.new(url)
request.body = "{\"example\":\"value\"}"

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.dev%2Fpost"

"api.scrapfly.io"
"/scrape"

key  = "" 
url  = "https://httpbin.dev/post" 

And here's a full example with for posting JSON and configuring the content-type header:

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fpost&headers[content-type]=application%2Fjson")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Post.new(url)
request.body = "{\"example\":\"value\"}"

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.dev%2Fpost&headers[content-type]=application%2Fjson"

"api.scrapfly.io"
"/scrape"

key                    = "" 
url                    = "https://httpbin.dev/post" 
headers[content-type]  = "application/json" 

PUT requests are used to submit forms and upload user-created content. When using this method, if content-type header is not set explicitly, it'll default to application/x-www-form-urlencoded as we assume you send urlencoded data. For putting JSON data, specify content-type: application/json header.

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fput&headers[content-type]=application%2Fjson")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Put.new(url)
request.body = "{\"example\":\"value\"}"

response = https.request(request)
puts response.read_body

PATCH requests are used to submit forms and update user-created content. When using this method, if content-type header is not set explicitly, it'll default to application/x-www-form-urlencoded as we assume you send urlencoded data. For patching JSON data, specify content-type: application/json header.

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fput&headers[content-type]=application%2Fjson")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Patch.new(url)
request.body = "{\"example\":\"value\"}"

response = https.request(request)
puts response.read_body

Headers

Request headers sent by Scrapfly can be customized through the headers parameter. Note that the value of headers must be urlencoded to prevent any side effects. When in doubt, use Scrapfly's url encoding web tool.

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fheaders&headers[foo]=bar")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.dev%2Fheaders&headers[foo]=bar"

"api.scrapfly.io"
"/scrape"

key           = "" 
url           = "https://httpbin.dev/headers" 
headers[foo]  = "bar" 
You can also pass multiple time the same header e.g: headers[X-foo][0]=bar&headers[X-foo][1]=baz and the order order and structure will be replicated

By default, Scrapfly get you covered and handle automatically default headers to replicate a browser. You learn more about headers in this dedicated article

When Anti Scraping Protection is enabled, headers are fine-tuned on target you scrape is also done at header level to maximize your success rate

Important headers to keep in mind in web scraping context:

Content-Type

Specifies the media type of the resource being sent in the HTTP message body. It tells the recipient what kind of data to expect and how to interpret it. The Content-Type header is typically used in HTTP requests and responses, particularly in responses from servers to clients.

Example, sending a POST request with a body of JSON data, you must specify application/json.

By default, if you send a POST request without Content-Type header, application/x-www-form-urlencoded will be set.

If this Header is not correctly configured, the target website respond with a 400, 406 or block you

Accept

Indicates the media types that the client is willing to receive in the response. Helps servers determine the appropriate representation of the requested resource.

By default, we set what the our browser would sent

Example for a JSON API expecting a response in JSON: Accept: application/json

If this Header is not correctly configured, the target website respond with a 400 or block you

Referer

Indicates the URL of the web page from which the request originated. Often used by servers to track the source of incoming requests.

By default, this header is not sent if not specified

Example: Referer: https://www.example.com/page1.html

This header can be mutated while using Anti-Scraping Protection feature

Behavior And Interaction With Other Features

When the ASP is activated or a specific os is set, following headers become immutable or limited

  • user-agent: If you set a custom chrome user agent, it will be ignored to keep our actual version
  • sec-ch-ua: Ignored
  • sec-ch-ua-arch: Ignored
  • sec-ch-ua-platform: Ignored
  • sec-ch-ua-platform-version: Ignored
  • sec-ch-ua-full-version: Ignored
  • sec-ch-ua-bitness: Ignored

With ASP activated, referer header is auto handled if no header set. However on specific target you might want to disable this. Setting referer header to none will prevent it

Cookies

Cookies are regaular HTTP headers and shouldn't be treated in a special way. While most HTTP clients and libraries have a dedicated API to manage cookies, to manage cookies with Scrapfly API simply set the appropriate headers.

This header should never be sent from the client’s side. It's a response header sent when upstream wants to register a cookie with the client.

This header contains the cookie values held by the client. So, when scraping this should be used to include cookie data. The Cookie header contains key-to-value pairs of data separated by semicolons. For example:

  • Single cookie: Cookie: test=1
  • Multiple cookie: Cookie: test=1;lang=fr;currency=USD
You can also pass multiple Cookie headers to send multiple time the cookie headers Example: headers[Cookie][0]=foo=1&headers[Cookie][1]=bar
require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fcookies&headers[cookie]=lang%3Dfr%3Bcurrency%3DUSD%3Btest%3D1")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.dev%2Fcookies&headers[cookie]=lang%3Dfr%3Bcurrency%3DUSD%3Btest%3D1"

"api.scrapfly.io"
"/scrape"

key              = "" 
url              = "https://httpbin.dev/cookies" 
headers[cookie]  = "lang=fr;currency=USD;test=1" 

Geo Targeting

Each Scrapfly request can be sent from a specific country. This is called Geo-Targetting and is managed by Scrapfly's proxy network.

The desired country can be specified 2-letter country codes (ISO 3166-1 alpha-2). Available countries are defined on the proxy pool dashboard. If the country is not available in the Public Pool a personal private pool can be created with desired countries. Note that restricting countries also restricts the available proxy IP pool.

To specify geo targetting the country parameter can be used:

  • Single country selection: country=us
  • Multi country selection with random selection: country=us,ca,mx
  • Multi country selection with weighted random selection (higher weights have higher probability): country=us:1,ca:5,mx:3
  • Country exclusion: country=-gb

For example, to send request through the United States the country=us would be used:

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?country=us&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?country=us&key=&url=https%3A%2F%2Fhttpbin.dev%2Fanything"

"api.scrapfly.io"
"/scrape"

country  = "us" 
key      = "" 
url      = "https://httpbin.dev/anything" 
For more on proxies, see the proxy documentation page

For spoofing the latitude and longitude of web browser's location services, the geolocation parameter can be used, for example: geolocation=48.856614,2.3522219 (latitude, longitude):

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?key=&url=https%3A%2F%2Fhttpbin.dev%2Fanything"

"api.scrapfly.io"
"/scrape"

key  = "" 
url  = "https://httpbin.dev/anything" 

The available country options depend on the selected proxy_pool. See this table for available options for your account:

Language

Content language can be configured through the lang parameter. By default the language is inferred from the proxy location. So, if proxy of France is used the scrape request will be configured with french language preferences.

Behind the scenes, this is done by configuring the Accept-Language HTTP header. If the website supports this header and the requested language, the content will be returned in that language.

Multiple language options can be passed as well by providing multiple comma-separated values. Country locale is also supported in {lang iso2}-{country iso2} format. Note that the order matters as the website will negotiate the content language based on this order.

For example, lang=fr,en-US,en will result in final header Accept-Language: fr-{proxy country iso2},fr;q=0.9,en-US;q=0.8,en;q=0.7

Most users prefer English regardless of the proxy location. For that, use lang=en-US,en
require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?lang=en-us%2Cen&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?lang=en-us%2Cen&key=&url=https%3A%2F%2Fhttpbin.dev%2Fanything"

"api.scrapfly.io"
"/scrape"

lang  = "en-us,en" 
key   = "" 
url   = "https://httpbin.dev/anything" 

Operating System

We do not recommend using this feature unless it's absolutely necessary as it can impact scraper blocking rates.

By default, Scrapfly automatically selects the most suitable Operating System for all outgoing requests. To configure operating system explicitly the os parameter can be used.

The supported values are: win,win10,win11,mac,linux,chromeos

Because of potential conflicts, the os parameter and User-Agent header cannot be set at the same time.

For example, to set Operating System to Windows 11 the os=win11 parameter would be used:

require "uri"
require "net/http"

url = URI("https://api.scrapfly.io/scrape?os=win11&key=__API_KEY__&url=https%3A%2F%2Fhttpbin.dev%2Fanything")

https = Net::HTTP.new(url.host, url.port);
https.use_ssl = true

request = Net::HTTP::Get.new(url)

response = https.request(request)
puts response.read_body
"https://api.scrapfly.io/scrape?os=win11&key=&url=https%3A%2F%2Fhttpbin.dev%2Fanything"

"api.scrapfly.io"
"/scrape"

os   = "win11" 
key  = "" 
url  = "https://httpbin.dev/anything" 

Integration

Summary