Guide to List Crawling: Everything You Need to Know
In-depth look at list crawling - how to extract valuable data from list-formatted content like tables, listicles and paginated pages.
Python has many different HTTP clients that can be used for web scraping. However, not all of them support HTTP2 which can be vital in avoiding web scraper blocking.
Here are the most popular HTTP clients that support HTTP2:
import httpx
with httpx.Client(http2=True) as client:
response = client.get("https://httpbin.dev/anything")
import h2.connection
import h2.config
config = h2.config.H2Configuration()
conn = h2.connection.H2Connection(config=config)
conn.send_headers(stream_id=stream_id, headers=headers)
conn.send_data(stream_id, data)
socket.sendall(conn.data_to_send())
events = conn.receive_data(socket_data)
So, it's best to stick to httpx
for HTTP2 though if you have a complex use case h2
can be adapted to extendible libraries like twisted
.
For more on HTTPX in web scraping see our hands-on introduction article which covers everything you need to know when it comes to web scraping
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇