Open In App

Extract date from a specified column of a given Pandas DataFrame using Regex

Last Updated : 29 Aug, 2020
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to extract only valid date from a specified column of a given Data Frame. The extracted date from the specified column should be in the form of  ‘mm-dd-yyyy’.

Approach:

In this article, we have used a regular expression to extract valid date from the specified column of the data frame. Here we used \b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/([0-9]{4})\b this regular expression. We’ll be using re.findall() method for this. Now let us try to implement this using Python: 

Step 1: Creating Dataframe

Python3




# importing pandas and re library
import pandas as pd
import re as re
  
# creating data frame with column
# name,date_of_birth and age
df = pd.DataFrame({'Name': ['Akash', 'Shyam', 'Ayush',
                            'Diksha', 'Radhika'],
  
                   'date_of_birth': ['12/21/1998', '15/12/1998',
                                     '06/11/2000', '05/10/1998',
                                     '13/12/2010'],
  
                   'Age': [21, 12, 20, 21, 10]})
  
# printing the original data frame
print("Printing the original dataframe")
df


Output:

Step 2: Extracting valid date from data frame in the format ‘mm-dd-yyyy’

Python3




# creating function to find whether the 
# given date is valid or not
def checking_valid_dates(dt):
      
    # creating regular expression to check 
    # whether date fall in the format 
    # mm-dd-yyyy
    result = re.findall(
        r'\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/([0-9]{4})\b', dt)
    return result
  
  
# creating new column with valid_date_of_birth
df['valid_date_of_birth'] = df['date_of_birth'].apply(
    lambda dt: checking_valid_dates(dt))
  
print("\nPrinting the data frame Valid dates in the format: mm-dd-yyyy:")
df


Output:



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads