How to Parse Datetime Strings with Python and Dateparser

article feature image

Parsing datetime strings when web scraping is one of the most commonly encountered challenges as the web is full of different date and time formats.

In this article, we'll be taking a look at a Python package called dateparser which can automatically parse datetime strings from almost any text format. In short, we'll be converting Python string to datetime object with the most hands-off approach available in Python.

What is Dateparser?

Dateparser is a smart date and time text parsing library for Python. It's a popular community package that can extract real datetime objects from almost any text containing date or time data.

How to use Dateparser?

The key function of Dateparser is the dateparser.parse function.

It takes any string as an argument and tries to find any datetime data in it. For example, let's say we scraped a bunch of date strings from a website and let's put them through dateparser:

import dateparser

# Parsing dates in different formats
dates_in_different_formats = [
    "2023-06-07",    # ISO 8601 format
    "06/07/2023",    # US format
    "07/06/2023",    # European format
    "June 7, 2023",  # Long format
    "7 Jun 2023",    # Another common format
    "2023-06-07T15:25:10", # ISO 8601 with time
    "2023-06-07 15:25:10", # Space separated date and time
    "2023-06-07 15:25:10.555", # Time with milliseconds
]

for date_string in dates_in_different_formats:
    parsed_date = dateparser.parse(date_string)
    print(f"Original: {date_string}\n  Parsed: {parsed_date}")
"""
Original: 2023-06-07
  Parsed: 2023-06-07 00:00:00
Original: 06/07/2023
  Parsed: 2023-06-07 00:00:00
Original: 07/06/2023
  Parsed: 2023-07-06 00:00:00
Original: June 7, 2023
  Parsed: 2023-06-07 00:00:00
Original: 7 Jun 2023
  Parsed: 2023-06-07 00:00:00
Original: 2023-06-07T15:25:10
  Parsed: 2023-06-07 15:25:10
Original: 2023-06-07 15:25:10
  Parsed: 2023-06-07 15:25:10
Original: 2023-06-07 15:25:10.555
  Parsed: 2023-06-07 15:25:10.555000
"""

Using Dateparser we can parse all of these different formats without having to specify any datetime formats explicitly ourselves which makes date parsing when web scraping much more accessible.

Common Problems

Datetime objects are still complicated and highly contextual. So, not all problems can be solved by Dateparser automatically without specifying parsing preferences. Here are the top problems encountered with Dateearser and how to address them useing dateparser settings.

Date Object Order

The most common issue with parsing datetime strings using Dateparser is that it can't parse dates with ambiguous date object order.

For example, if we have a date like 07/06/2023 it's impossible to know if it's the 7th of June or the 6th of July. To address this the DATE_ORDER setting can be used:

import dateparser

# Day first and year first dates
dates = [
    "13/06/2023",
    "06/13/2023",
    "23/06/13",
    "13/06/23",
    "2023/06/13",
]

# Parsing with 'DMY' order
print("Parsing with 'DMY' order")
for date_string in dates:
    parsed_date = dateparser.parse(date_string, settings={'DATE_ORDER': 'DMY'})
    print(f"Original: {date_string}, Parsed: {parsed_date}")

# Parsing with 'YMD' order
print("\nParsing with 'YMD' order")
for date_string in dates:
    parsed_date = dateparser.parse(date_string, settings={'DATE_ORDER': 'YMD'})
    print(f"Original: {date_string}, Parsed: {parsed_date}")
"""
Parsing with 'DMY' order
Original: 13/06/2023, Parsed: 2023-06-13 00:00:00
Original: 06/13/2023, Parsed: None
Original: 23/06/13, Parsed: 2013-06-23 00:00:00
Original: 13/06/23, Parsed: 2023-06-13 00:00:00
Original: 2023/06/13, Parsed: None

Parsing with 'YMD' order
Original: 13/06/2023, Parsed: 2023-06-13 00:00:00
Original: 06/13/2023, Parsed: 2023-06-13 00:00:00
Original: 23/06/13, Parsed: 2023-06-13 00:00:00
Original: 13/06/23, Parsed: 2013-06-23 00:00:00
Original: 2023/06/13, Parsed: 2023-06-13 00:00:00
"""

The date order can be usually guessed by geolocation or the language of the scraped website. For example, US websites usually use MDY format while the rest of the world uses the DMY or YMD format.

Handling Implicit Timezones

Many scraped websites often use implicit timezones. For example, if we scrape a website that shows content relative to New York, the scraped datetime strings are likely to be in New York time.

For this, the timezone can be specified manually using the TIMEZONE setting:

import dateparser

dateparser.parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': 'US/Eastern'})
datetime.datetime(2012, 1, 12, 22, 0)

parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': '+0500'})
datetime.datetime(2012, 1, 12, 22, 0)

Incomplete Dates

Some datetime strings can be implicitly incomplete. For example, if we scrape "December 2023" we don't know the exact date. For this, the PREFER_DAY_OF_MONTH setting can be used:

import dateparser
dateparser.parse('December 2023')  # default behavior is today's date:
datetime.datetime(2023, 12, 16, 0, 0)

dateparser.parse('December 2023', settings={'PREFER_DAY_OF_MONTH': 'last'})
datetime.datetime(2023, 12, 31, 0, 0)
dateparser.parse('December 2023', settings={'PREFER_DAY_OF_MONTH': 'first'})
datetime.datetime(2023, 12, 1, 0, 0)

For cases where the year is implicit the PREFER_DATES_FROM setting can be used:

import dateparser

# default implies the date is from the current year
dateparser.parse('March')
datetime.datetime(2023, 3, 7, 0, 0)

# to imply the date is from the future
dateparser.parse('March', settings={'PREFER_DATES_FROM': 'future'})
datetime.datetime(2024, 3, 7, 0, 0)

# to imply the date is from the past
dateparser.parse('March', settings={'PREFER_DATES_FROM': 'past'})
datetime.datetime(2022, 3, 7, 0, 0)

Summary

Dateparser is a powerful library for parsing datetime strings. It can parse dates in many different formats without having to specify any datetime formats ourselves. It also has many settings that can be used to handle common problems like implicit timezones and incomplete dates.

To see a real-life example of Dateparser in web scraping see our how to scrape ebay tutorial where we use Dateparser to parse dates used in Ebay listings.

Related Posts

How to Track Competitor Prices Using Web Scraping

In this web scraping guide, we'll explain how to create a tool for tracking competitor prices using Python. It will scrape specific products from different providers, compare their prices and generate insights.

Intro to Using Web Scraping For Sentiment Analysis

In this article, we'll explore using web scraping for sentiment analysis. We'll start by defining sentiment analysis and then walk through a practical example of performing sentiment analysis on web-scraped data with community Python libraries.

Intro to Parsing HTML and XML with Python and lxml

In this tutorial, we'll take a deep dive into lxml, a powerful Python library that allows for parsing HTML and XML effectively. We'll start by explaining what lxml is, how to install it and using lxml for parsing HTML and XML files. Finally, we'll go over a practical web scraping with lxml.