Data Parsing Knowledgebase

Data parsing is a fundamental aspect of web scraping and data programming, enabling the extraction and transformation of data from various formats into structured, usable forms. It involves interpreting raw data, such as scraped HTML pages, complex backend JSON responses, or XML sitemaps. All of this has to be parsed and transformed into a format that can be easily analyzed or stored reliably as it can change often.

In web scraping, there are many aspects to data parsing like:

Parsing scraped HTML documents to extract relevant information using techniques like CSS Selectors or XPath.
Looking for hidden data in HTML elements like <script> tags or comments.
Extracting secret values from obfuscated or encoded responses.
Parsing complex JSON trees from graphql and difficult backend API responses.

With all that there are many brilliant libraries and tools to assist with any data parsing task in the context of web scraping.

See below for more on data parsing in the context of web scraping and data programming 👇

How to scrape HTML table to Excel Spreadsheet (.xlsx)?

To scrape tables to Excel spreadsheet we can use bs4, requets and xlsxwriter packages for Python. Here's how.

How to select last element in XPath?

To select last element in XPath we cannot use indexing as -1 index is not supported. Instead, last() function can be used. Here's how.

How to select all elements between two elements in XPath?

To select all elements between two different elements preceding-sibling or following-sibling axis selectors can be used. Here's how.

How to select dictionary key recursively in Python?

To select dictionary keys recursively in Python the "nested-lookup" package implements the most popular nested key selection algorithms.

What are some ways to parse JSON datasets in Python?

There are several popular options when it comes to JSON dataset parsing in Python. The most popular packages are Jmespath and Jsonpath.

What are devtools and how they're used in web scraping?

Developer tools suite is used in web development but can also be used in web scraping to understand how target websites work. Here's how to use it.

How to parse dynamic CSS classes when web scraping?

Dynamic CSS can make be very difficult to scrape. There are a few tricks and common idioms to approach this though.

How to select HTML elements by text using CSS Selectors?

It's not possible to select HTML elements by text in original CSS selectors specification but here are some alternative ways to do it.

Articles Related to Data Parsing

Ultimate Guide to JSON Parsing in Python

Learn JSON parsing in Python with this ultimate guide. Explore basic and advanced techniques using json, and tools like ijson and nested-lookup

Ultimate Guide to JSON Parsing in Python

Guide to Parsel - the Best HTML Parsing in Python

Learn to extract data from websites with Parsel, a Python library for HTML parsing using CSS selectors and XPath.

Guide to Parsel - the Best HTML Parsing in Python

JSONL vs JSON

Learn the differences between JSON and JSONLines, their use cases, and efficiency. Why JSONLines excels in web scraping and real-time processing

JSONL vs JSON

Web Scraping and HTML Parsing with Jsoup and Java

Learn how to harness the power of jsoup, a lightweight and efficient Java library for web scraping and HTML parsing.

Web Scraping and HTML Parsing with Jsoup and Java

JSON vs XML: Key Differences and Modern Uses

JSON and XML are two major data formats encountered in web development — here's how they differ and which is one better for your use case.

JSON vs XML: Key Differences and Modern Uses

What is Parsing? From Raw Data to Insights

Learn about the fundamentals of parsing data, across formats like JSON, XML, HTML, and PDFs. Learn how to use Python parsers and AI models for efficient data extraction.

What is Parsing? From Raw Data to Insights

Web Scraping with Go

Learn web scraping with Golang, from native HTTP requests and HTML parsing to a step-by-step guide to using Colly, the Go web crawling package.

Web Scraping with Go

Intro to Parsing HTML and XML with Python and lxml

In this tutorial, we'll take a deep dive into lxml, a powerful Python library that allows for parsing HTML and XML effectively. We'll start by explaining what lxml is, how to install it and using lxml for parsing HTML and XML files. Finally, we'll go over a practical web scraping with lxml.

Intro to Parsing HTML and XML with Python and lxml

How to Parse XML

In this article, we'll explain about XML parsing. We'll start by defining XML files, their format and how to navigate them for data extraction.

How to Parse XML

Web Scraping to Google Sheets

Google sheets is an easy to store scraped data. In this tutorial we'll take a look at how to use this free online database for storing scraped data!

Web Scraping to Google Sheets

Web Scraping Emails using Python

In this tutorial we'll take a look at email scraping. How to crawl pages and extract email addresses using Python and what are some popular challenges.

Web Scraping Emails using Python

Web Scraping Phone Numbers with Python

In this article we'll dive into phone number scraping. We'll explore an example object and cover common phone number scraping challenges like obfuscation.

Web Scraping Phone Numbers with Python

Intro to Web Scraping Images with Python

In this guide, we’ll explore how to scrape images from websites using different methods. We'll also cover the most common image scraping challenges and how to overcome them. By the end of this article, you will be an image scraping master!

Intro to Web Scraping Images with Python

Ultimate XPath Cheatsheet for HTML Parsing in Web Scraping

Ultimate companion for HTML parsing using XPath selectors. This cheatsheet contains all syntax explanations with interactive examples.

Ultimate XPath Cheatsheet for HTML Parsing in Web Scraping

Ultimate CSS Selector Cheatsheet for HTML Parsing

Ultimate companion for HTML parsing using CSS selectors. This cheatsheet contains all syntax explanations with interactive examples.

Ultimate CSS Selector Cheatsheet for HTML Parsing

JSON Parsing Made Easy with ChatGPT in Web Scraping

ChatGPT web scraping techniques allow for faster web scraping development. Here's how you can save a lot of time parsing JSON data with the help of chatGPT!

JSON Parsing Made Easy with ChatGPT in Web Scraping

Finding Hidden Web Data with ChatGPT Web Scraping

In this article we take a look at how to get assistance from LLMs for hidden web data scraping.

Finding Hidden Web Data with ChatGPT Web Scraping

How to Parse Datetime Strings with Python and Dateparser

Dateparser is a popular Python package for parsing datetime strings. Here's how it can be used in web scraping and how to avoid common problems.

How to Parse Datetime Strings with Python and Dateparser

How to Scrape Sitemaps to Discover Scraping Targets

Usually to find scrape targets we look at site search or category pages but there's a better way - sitemaps! In this tutorial, we'll be taking a look at how to find and scrape sitemaps for target locations.

How to Scrape Sitemaps to Discover Scraping Targets

Web Scraping Simplified - Scraping Microformats

In this short intro we'll be taking a look at web microformats. What are microformats and how can we take advantage in web scraping? We'll do a quick overview and some examples in Python using extrcut library.

Web Scraping Simplified - Scraping Microformats

Introduction to Parsing JSON with Python JSONPath

Intro to using Python and JSONPath library and a query language for parsing JSON datasets.

Introduction to Parsing JSON with Python JSONPath

Quick Intro to Parsing JSON with JMESPath in Python

Introduction to JMESPath - JSON query language which is used in web scraping to parse JSON datasets for scrape data.

Quick Intro to Parsing JSON with JMESPath in Python

How to Scrape Hidden Web Data

The visible HTML doesn't always represent the whole dataset available on the page. In this article, we'll be taking a look at scraping of hidden web data. What is it and how can we scrape it using Python?

How to Scrape Hidden Web Data

How to Ensure Web Scrapped Data Quality

Ensuring consitent web scrapped data quality can be a difficult and exhausting task. In this article we'll be taking a look at two populat tools in Python - Cerberus and Pydantic - and how can we use them to validate data.

How to Ensure Web Scrapped Data Quality

Creating Search Engine for any Website using Web Scraping

Guide for creating a search engine for any website using web scraping in Python. How to crawl data, index it and display it via js powered GUI.

Creating Search Engine for any Website using Web Scraping

Web Scraping with Python

Introduction tutorial to web scraping with Python. How to collect and parse public data. Challenges, best practices and an example project.

Web Scraping with Python

Web Scraping With R Tutorial and Example Project

Introduction to web scraping with R language. How to handle http connections, parse html files, best practices, tips and an example project.

Web Scraping With R Tutorial and Example Project

Web Scraping With Ruby

Introduction to web scraping with Ruby. How to handle http connections, parse html files for data, best practices, tips and an example project.

Web Scraping With Ruby

Web Scraping With NodeJS and Javascript

In this article we'll take a look at scraping using Javascript through NodeJS. We'll cover common web scraping libraries, frequently encountered challenges and wrap everything up by scraping etsy.com

Web Scraping With NodeJS and Javascript

How to Web Scrape with Puppeteer and NodeJS in 2025

Introduction to using Puppeteer in Nodejs for web scraping dynamic web pages and web apps. Tips and tricks, best practices and example project.

HEADLESS-BROWSER

How to Web Scrape with Puppeteer and NodeJS in 2025

Parsing HTML with CSS Selectors

Introduction to using CSS selectors to parse web-scraped content. Best practices, available tools and common challenges by interactive examples.

Parsing HTML with CSS Selectors

Parsing HTML with Xpath

Introduction to xpath in the context of web-scraping. How to extract data from HTML documents using xpath, best practices and available tools.

Parsing HTML with Xpath

Web Scraping With PHP 101

Introduction to web scraping with PHP. How to handle http connections, parse html files for data, best practices, tips and an example project.

Web Scraping With PHP 101

How to Parse Web Data with Python and Beautifulsoup

Beautifulsoup is one the most popular libraries in web scraping. In this tutorial, we'll take a hand-on overview of how to use it, what is it good for and explore a real -life web scraping example.

How to Parse Web Data with Python and Beautifulsoup