Extract date from a specified column of a given Pandas DataFrame using Regex
Last Updated :
29 Aug, 2020
In this article, we will discuss how to extract only valid date from a specified column of a given Data Frame. The extracted date from the specified column should be in the form of ‘mm-dd-yyyy’.
Approach:
In this article, we have used a regular expression to extract valid date from the specified column of the data frame. Here we used \b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/([0-9]{4})\b this regular expression. We’ll be using re.findall() method for this. Now let us try to implement this using Python:
Step 1: Creating Dataframe
Python3
import pandas as pd
import re as re
df = pd.DataFrame({ 'Name' : [ 'Akash' , 'Shyam' , 'Ayush' ,
'Diksha' , 'Radhika' ],
'date_of_birth' : [ '12/21/1998' , '15/12/1998' ,
'06/11/2000' , '05/10/1998' ,
'13/12/2010' ],
'Age' : [ 21 , 12 , 20 , 21 , 10 ]})
print ( "Printing the original dataframe" )
df
|
Output:
Step 2: Extracting valid date from data frame in the format ‘mm-dd-yyyy’
Python3
def checking_valid_dates(dt):
result = re.findall(
r '\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/([0-9]{4})\b' , dt)
return result
df[ 'valid_date_of_birth' ] = df[ 'date_of_birth' ]. apply (
lambda dt: checking_valid_dates(dt))
print ( "\nPrinting the data frame Valid dates in the format: mm-dd-yyyy:" )
df
|
Output:
Share your thoughts in the comments
Please Login to comment...