How to Find All URLs on a Domain
Learn how to efficiently find all URLs on a domain using Python and web crawling. Guide on how to crawl entire domain to collect all website data
Python has many different HTTP clients that can be used for web scraping. However, not all of them support HTTP2 which can be vital in avoiding web scraper blocking.
Here are the most popular HTTP clients that support HTTP2:
import httpx
with httpx.Client(http2=True) as client:
response = client.get("https://httpbin.dev/anything")
import h2.connection
import h2.config
config = h2.config.H2Configuration()
conn = h2.connection.H2Connection(config=config)
conn.send_headers(stream_id=stream_id, headers=headers)
conn.send_data(stream_id, data)
socket.sendall(conn.data_to_send())
events = conn.receive_data(socket_data)
So, it's best to stick to httpx
for HTTP2 though if you have a complex use case h2
can be adapted to extendible libraries like twisted
.
For more on HTTPX in web scraping see our hands-on introduction article which covers everything you need to know when it comes to web scraping
This knowledgebase is provided by Scrapfly data APIs, check us out! 👇