🚀 We are hiring! See open positions

Data Parsing Knowledgebase

Data parsing is a fundamental aspect of web scraping and data programming, enabling the extraction and transformation of data from various formats into structured, usable forms. It involves interpreting raw data, such as scraped HTML pages, complex backend JSON responses, or XML sitemaps. All of this has to be parsed and transformed into a format that can be easily analyzed or stored reliably as it can change often.

In web scraping, there are many aspects to data parsing like:

  • Parsing scraped HTML documents to extract relevant information using techniques like CSS Selectors or XPath.
  • Looking for hidden data in HTML elements like <script> tags or comments.
  • Extracting secret values from obfuscated or encoded responses.
  • Parsing complex JSON trees from graphql and difficult backend API responses.

With all that there are many brilliant libraries and tools to assist with any data parsing task in the context of web scraping.

See below for more on data parsing in the context of web scraping and data programming 👇

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

To scrape tables to Excel spreadsheet we can use bs4, requets and xlsxwriter packages for Python. Here's how.

#data-parsing
#python

How to select last element in XPath?

To select last element in XPath we cannot use indexing as -1 index is not supported. Instead, last() function can be used. Here's how.

#xpath
#data-parsing

What are some ways to parse JSON datasets in Python?

There are several popular options when it comes to JSON dataset parsing in Python. The most popular packages are Jmespath and Jsonpath.

#python
#data-parsing

How to select all elements between two elements in XPath?

To select all elements between two different elements preceding-sibling or following-sibling axis selectors can be used. Here's how.

#xpath
#data-parsing

How to select dictionary key recursively in Python?

To select dictionary keys recursively in Python the "nested-lookup" package implements the most popular nested key selection algorithms.

#python
#data-parsing

What are devtools and how they're used in web scraping?

Developer tools suite is used in web development but can also be used in web scraping to understand how target websites work. Here's how to use it.

#http
#data-parsing
#xpath
#css-selectors
#hidden-api

How to parse dynamic CSS classes when web scraping?

Dynamic CSS can make be very difficult to scrape. There are a few tricks and common idioms to approach this though.

#data-parsing

How to use XPath selectors in NodeJS when web scraping?

To parse HTML using XPath in Nodejs we can use one of two popular libraries like osmosis or xmldom. Here's how.

#nodejs
#xpath
#data-parsing

Articles Related to Data Parsing