How to Web Scrape with HTTPX and Python
Intro to using Python's httpx library for web scraping. Proxy and user agent rotation and common web scraping challenges, tips and tricks.
Modern web scraping often involved a lot of JSON parsing through hidden web data scraping or backend API scraping in particular. There are several ways to parse JSON data in Python.
JMESPath is a popular JSON query language and library available in many languages:
Complete introduction to using JMESPath in Python for JSON parsing and an example web scraping project.
JSONPath is another popular JSON query language and library available in many languages:
Complete introduction to JSONPath and how to use it in Python through an example web scraping project.
Both of these tools are a great way to parse JSON datasets within Python. As for which one is better - generally, JSONPath is more powerful by offering recursive selectors (e.g. $..book
will select key book
anywhere in the dataset) while Jmespath has a more intuitive syntax and better data reshaping capabilities (e.g. renaming keys and flattening nested data structures).