Python has many different HTTP clients that can be used for web scraping. However, not all of them support HTTP2 which can be vital in avoiding web scraper blocking.
Here are the most popular HTTP clients that support HTTP2:
HTTPX - is one of the most popular new libraries for Python. HTTPX supports HTTP2 as well as asyncio making it great for web scraping:
import httpx
with httpx.Client(http2=True) as client:
response = client.get("https://httpbin.dev/anything")
h2 is a low-level implementation of HTTP2 protocol. It's not recommended to use it directly for web scraping but it can be the only way to implement complex HTTP2 interactions for niche web scrapers.
In this article, we'll explore how to scrape Reddit. We'll extract various social data types from subreddits, posts, and user pages. All of which through plain HTTP requests without headless browser usage.
In this scrape guide we'll be taking a look at one of the most popular web scraping targets - LinkedIn.com. We'll be scraping people profiles, company profiles as well as job listings and search.