Sending HTTP Requests With Curlie: A better cURL

Sending HTTP Requests With Curlie: A better cURL

cURL is a great command-line tool for sending HTTP requests, which can be a viable asset in the web scraping toolbox. However, its syntax and output can be confusing. What about a better alternative?

In this guide, we'll explore Curlie, a better cURL version. We'll start by defining what Curlie is and how it compares to cURL. We'll also go over a step-by-step guide on using and configuring Curlie to send HTTP requests. Let's get started!

What is Curlie?

Curlie is an interface for the regular cURL. Its interface is built on top of HTTPie, a CLI and HTTP client app for sending HTTP requests in a colorful formatted output.

Curlie combines the cURL features with the easy syntax and output formatting of HTTPie. It allows for writing commands in the syntax of both cURL and HTTPie.

Using API Clients For Web Scraping: Postman

Learn how to use API clients for web scraping. We'll explain how to locate hidden API requests on websites, importing, manipulating, and exporting them to Postman for efficient API-based web scrapers.

Using API Clients For Web Scraping: Postman

How To Install Curlie?

Curlie can be installed for all operating systems using command lines through different package managers.

Mac

brew install curlie
# or
curl -sS https://webinstall.dev/curlie | bash

Linux

curl -sS https://webinstall.dev/curlie | bash
# or
eget rs/curlie -a deb --to=curlie.deb
sudo dpkg -i curlie.deb

Windows

curl.exe -A "MS" https://webinstall.dev/curlie | powershell

How To Use Crulie?

In the following sections, we'll explain using Curlie to send and configure HTTP requests. Curlie accepts the syntax of both cURL and HTTPie and since we covered cURL in a previous guide, we'll use the HTTPie syntax in this one.

That being said, all the cURL options used by Curlie under the hood can be viewed by adding the --curl option.

How to Use cURL For Web Scraping?

Learn how to use cURL through a step-by-step guide. You will also learn about common web scraping tips and tricks with cURL.

How to Use cURL For Web Scraping?

Configuring HTTP Method

All the Curlie requests start with the curlie command. By default, all the requests sent follow the GET HTTP method:

curlie https://httpbin.dev/get

Running the above command will send a GET request and return the response formatted. It will also return the response headers, which isn't enabled by default with cURL:

{
    "args": {

    },
    "headers": {
        "Accept": [
            "application/json, */*"
        ],
        "Accept-Encoding": [
            "gzip"
        ],
        "Host": [
            "httpbin.dev"
        ],
        "User-Agent": [
            "curl/8.4.0"
        ]
    },
    "origin": "156.192.187.116",
    "url": "https://httpbin.dev/get"
}
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Content-Length: 288
Content-Security-Policy: frame-ancestors 'self' *.httpbin.dev; font-src 'self' *.httpbin.dev; default-src 'self' *.httpbin.dev; img-src 'self' *.httpbin.dev https://cdn.scrapfly.io; media-src 'self' *.httpbin.dev; script-src 'self' 'unsafe-inline' 'unsafe-eval' *.httpbin.dev; style-src 'self' 'unsafe-inline' *.httpbin.dev https://unpkg.com; frame-src 'self' *.httpbin.dev; worker-src 'self' *.httpbin.dev; connect-src 'self' *.httpbin.dev
Content-Type: application/json; encoding=utf-8
Date: Wed, 06 Mar 2024 17:47:12 GMT
Permissions-Policy: fullscreen=(self), autoplay=*, geolocation=(), camera=()
Referrer-Policy: strict-origin-when-cross-origin
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block

To change the Curlie HTTP method, we can use the -v option. For example, here is how we can send a POST request with Curlie. The same approach can be followed for other HTTP methods (HEAD, PUT, DELETE, etc.):

curlie -v POST https://httpbin.dev/anything

Adding Headers, Cookies and Body

Headers

Adding headers to Curlie is pretty straightforward. All we have to do is specify the header name and its value separated by a colon:

curlie https://httpbin.dev/headers Content-Type:"application/json" User-Agent:"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0"

In the above command, we add User-Agent and Content-Type headers. The response will include the modified headers:

{
    "headers": {
        ....
        "Content-Type": [
            "application/json"
        ],
        "User-Agent": [
            "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0"
        ]
    }
}

Cookies

Cookies in Curlie follow the same approach of adding headers. They can be added by appending them to a Cookie header with the name and value pairs:

Curlie https://httpbin.dev/cookies Cookie:some_cookie=foo
{
    "some_cookie": "foo"
}

Request Body

Lastly, let's explore adding a request body for POST requests. For this, we can simply add the data as key-value pairs, which will be converted to JSON by Curlie:

curlie https://httpbin.dev/anything key1=value1 key2=value2
{
    ....
    "json": {
        "key1": "value1",
        "key2": "value2"
    }
}

Downloading Files

Sending HTTP requests to download binary data is a common use case. Just like the regular cURL, Curlie allows for downloading binary data using the -O option:

curlie -O https://web-scraping.dev/assets/pdf/tos.pdf/tos.pdf

The above Curlie command will download a PDF from web-scraping.dev to the current directory. To change the downloaded file directory, we can use the --output-dir option:

curlie -O --create-dirs --output-dir /eula/pdfs https://web-scraping.dev/assets/pdf/tos.pdf/tos.pdf

Following Redirects

Just like cURL, Curlie doesn't follow HTTP redirects by default. To follow redirects with Curlie requests, we can use the --location or -L options:

curlie --location https://httpbin.dev/absolute-redirect/6

The above endpoint redirects the request six times. The --location option will allow Curlie to follow them until the final resource:

{
    "args": {

    },
    "headers": {
        "Accept": [
            "application/json, */*"
        ],
        "Accept-Encoding": [
            "gzip"
        ],
        "Host": [
            "httpbin.dev"
        ],
        "User-Agent": [
            "curl/8.4.0"
        ]
    },
    "url": "https://httpbin.dev/get"
}

By default, Curlie has a maximum of 50 following redirects. The maximum redirects limit can be overridden using the --max-redirs option:

curl -L https://httpbin.dev/absolute-redirect/51 --max-redirs 51

Basic Authentication

Basic authentication requires simple credential data: username and password. For example, requesting https://httpbin.dev/basic-auth/user/passwd from the browser will require the credentials before proceeding with the request:

basic auth window
Basic auth example

To set basic authentication with Curlie, we can use the --user or -u options:

curlie --user user:passwd --basic https://httpbin.dev/basic-auth/user/passwd

From the response, we can see that the request was authenticated:

{
    "authorized": true,
    "user": "user"
}

Curlie can also handle different types of authentication, such as cookie and bearer token authentication. For the detailed instructions, refer to our guide on managing authentication with cURL.

Adding Proxies

Websites use IP addresses to identify potential traffic abuse with specific IP addresses, leading to blocking them.

Hence, using proxies, especially for web scraping, allows for the distribution of the traffic load across multiple IP addresses. This makes it harder for websites to detect the IP addresses, preventing their blocking.

How to Avoid Web Scraper IP Blocking?

In this article, we'll look at Internet Protocol addresses and how IP tracking technologies are used to block web scrapers.

How to Avoid Web Scraper IP Blocking?

To use proxies with Curlie, we can use the -x or --proxy options, followed by the proxy type, domain, and port:

# HTTP
curl -x http://proxy_domain.com:8080 https://httpbin.dev/ip
# HTTPS
curl -x https://proxy_domain.com:8080 https://httpbin.dev/ip
# Proxies with crednetials
curl -x https://username:password@proxy.proxy_domain.com:8080 https://httpbin.dev/ip
# SOCKS5
curl -x socks5://proxy_domain.com:8080 https://httpbin.dev/ip

For further details on proxies, including their types, differences, and how they compare, refer to our dedicated guide.

Introduction To Proxies in Web Scraping

Learn about the various types of proxies, identify their common challenges, and their best practices when using them for web scraping.

Introduction To Proxies in Web Scraping

FAQ

To wrap up this guide on Curlie, let's have a look at some frequently asked questions.

Can I web scrape with Curlie?

Yes, but using Curlie for web scraping is limited to extracting shallow amounts of data or for development and debugging purposes. In a previous guide, we covered using cURL for web scraping, which can also be applied with Curlie as well.

Are there alternatives for Curlie?

Yes, Curl Impersonate is a modified cURL version that prevents cURL blocking by mimicking Chrome and Firefox configurations. Another alternative HTTP client for cURL is Postman. We have covered both Curl Impersonate and Postman in previous guides.

Summary

In this article, we explained Curlie, what it is, and how it differs from the regular cURL. We went through a step-by-step guide on using it to configure and send HTTP requests. We have covered:

  • Sending HTTP requests with different HTTP methods.
  • Configuring the headers and cookies.
  • Downloading binary data.
  • Adding proxies.

Related Posts

How to Use cURL GET Requests

Here's everything you need to know about cURL GET requests and some common pitfalls you should avoid.

How to Use cURL For Web Scraping

In this article, we'll go over a step-by-step guide on sending and configuring HTTP requests with cURL. We'll also explore advanced usages of cURL for web scraping, such as scraping dynamic pages and avoiding getting blocked.

Use Curl Impersonate to scrape as Chrome or Firefox

Learn how to prevent TLS fingerprinting by impersonating normal web browser configurations. We'll start by explaining what the Curl Impersonate is, how it works, how to install and use it. Finally, we'll explore using it with Python to avoid web scraping blocking.