JSON Parsing
JSON is a native data structure of Javascript though it's been widely used in other programming languages as well. The key feature of JSON is its simplicity and ease of use. It's similar to HTML — as it's a tree structure of key-value pairs — though JSON keys can only have 1 value.
Just like with HTML, there are several powerful JSON tools and query languages in particular. Let's take a look at some.
JSON Parsing Tools
JSON is a very simple data structure and can be parsed with any programming language natively. Though in web scraping we often need to deal with large and complex JSON datasets so additional query tools can be very useful.
The two most popular JSON parsing clients in scraper programming are jmespath
and jsonpath
.
JMesPath
With client support for almost every programming language Jmespath can reshape, clean and parse JSON
JsonPath
Inspire by XPath JsonPath mirrors many of the same features but for JSON instead of HTML.
Generally speaking, JmesPath is great for reshaping and filtering datasets and for more advance parsing functionality like recursive key selections JsonPath is the more feature-rich option.
Next up - Data Processing
Now that we know how to extract data from HTML and JSON documents we can move on to the next step - data processing and validation. In the next section, we'll take a look at popular data validation and cleanup techniques.