NLP | Using dateutil to parse dates.

The parser module can parse datetime strings in many more formats. There can be no better library than dateutil to parse dates and times in Python. To lookup the timezones, the tz module provides everything. When these modules are combined, they make it very easy to parse strings into timezone-aware datetime objects.

Installation :
dateutil can be installed using pip or easy_install, that is, sudo pip install dateutil==2.0 or sudo easy_install dateutil==2.0. 2.0 version for Python 3 compatibility is required. The complete documentation can be found at http://labix.org/python-dateutil.

Code: Parsing Examples



filter_none

edit
close

play_arrow

link
brightness_4
code

# importing library
from dateutil import parser
  
print (parser.parse('Thu Sep 25 10:36:28 2010'))
  
print (parser.parse('Thursday, 25. September 2010 10:36AM'))
  
print (parser.parse('9 / 25 / 2010 10:36:28'))
  
print (parser.parse('9 / 25 / 2010'))
  
print (parser.parse('2010-09-25T10:36:28Z'))

chevron_right


Output :

datetime.datetime(2010, 9, 25, 10, 36, 28)
datetime.datetime(2010, 9, 25, 10, 36)
datetime.datetime(2010, 9, 25, 10, 36, 28)
datetime.datetime(2010, 9, 25, 0, 0)
datetime.datetime(2010, 9, 25, 10, 36, 28, tzinfo=tzutc())

All it takes is importing the parser module and calling the parse() function with a datetime string. The parser can return a sensible datetime object, but it cannot parse the string, it will raise a ValueError.
How it works :

  • The parser instead of looking for recognizable tokens, guess what those tokens refer to. It doesn’t use regular expressions.
  • The order of these tokens matters as it uses a date format that looks like Month/Day/Year (the default order), while others use a Day/Month/Year format.
  • The parse() function takes an optional keyword argument, dayfirst, which defaults to False to deal with this problem.
  • It can correctly parse dates in the latter format if it is set to True.
filter_none

edit
close

play_arrow

link
brightness_4
code

parser.parse('16 / 6/2019', dayfirst = True)

chevron_right


Output :

datetime.datetime(2016, 6, 16, 0, 0)

Another ordering issue can occur with two-digit years. but ’11-6-19′ is an ambiguous date format. Since dateutil defaults to the Month-Day-Year format, ’11-6-19′ is parsed to the year 2019. But if yearfirst = True is passed into parse(), it can be parsed to the year 2011.

filter_none

edit
close

play_arrow

link
brightness_4
code

print (parser.parse('11-6-19'))
print (parser.parse('10-6-25', yearfirst = True))

chevron_right


Output :

datetime.datetime(2019, 11, 6, 0, 0)
datetime.datetime(2011, 6, 19, 0, 0)

dateutil parser can also do fuzzy parsing and allows to ignore extraneous characters in a datetime string. parse() will raise a ValueError with the default value of False, when it encounters unknown tokens. A datetime object can usually be returned, if fuzzy = True.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.