Open In App

How to Filter DataFrame Rows Based on the Date in Pandas?

Last Updated : 04 Dec, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Different regions follow different date conventions (YYYY-MM-DD, YYYY-DD-MM, DD/MM/YY, etc.).  It is difficult to work with such strings in the data. Pandas to_datetime() function allows converting the date and time in string format to datetime64. This datatype helps extract features of date and time ranging from ‘year’ to day, hour, minute, second, and even finer details like microseconds.

Filter DataFrame Rows Based on the Date in Pandas

To filter rows based on dates, first format the dates in the DataFrame to datetime64 type. Then use the DataFrame.loc[] and DataFrame.query[] functions from the Pandas package to specify a filter condition. As a result, acquire the subset of data, the filtered DataFrame. Let’s see some examples of the same.

Creating a Sample DataFrame

We will use a sample DataFrame consisting of the number of posts on a specific day. Convert the date in the sample data to datetime64 type as follows.

Python3




# Import Pandas package
import pandas as pd
 
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
                   'date': ['2020-08-09', '2020-08-25', '2020-09-05',
                            '2020-09-12', '2020-09-29', '2020-10-15',
                            '2020-11-21', '2020-12-02', '2020-12-10',
                            '2020-12-18']})
 
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
 
# Display dataframe
df


Output:

   num_posts       date
0 4 2020-08-09
1 6 2020-08-25
2 3 2020-09-05
3 9 2020-09-12
4 1 2020-09-29
5 14 2020-10-15
6 2 2020-11-21
7 5 2020-12-02
8 7 2020-12-10
9 2 2020-12-18

Filter DataFrame Rows Based on the Date in Pandas

Below are the ways by which we can filter DataFrame rows based on the Date in Pandas:

  • Using DataFrame.loc[] Function
  • Using DateTimeIndex(dt)
  • Using DataFrame.query()
  • Using pandas.Series.dt.strftime()

Filter DataFrame Rows on Dates Using DataFrame.loc[] Function

Filter data based on dates using DataFrame.loc[] function, the loc[] function is used to access a group of rows and columns of a DataFrame through labels or a boolean array. In this example, the conditional statement in loc[] returns a boolean array with True value if row satisfies condition (date is in between 1st and 15th September) and False value otherwise. Then the loc[] function returns only those rows having True values. 

Python3




# Import Pandas package
import pandas as pd
 
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
                   'date': ['2020-08-09', '2020-08-25',
                            '2020-09-05', '2020-09-12',
                            '2020-09-29', '2020-10-15',
                            '2020-11-21', '2020-12-02',
                            '2020-12-10', '2020-12-18']})
 
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
 
# Filter data between two dates
filtered_df = df.loc[(df['date'] >= '2020-09-01')
                     & (df['date'] < '2020-09-15')]
# Display
filtered_df


Output:

   num_posts       date
2 3 2020-09-05
3 9 2020-09-12

Filter DataFrame Rows Based on the Date Using DateTimeIndex(dt)

Use the DateTimeIndex(dt) to access separate date time attributes such as year, month, day, weekday, hours, minutes, seconds, microseconds etc. as a condition in loc[] function as follows. 

Note: The date values should be in datetime64 format.

Python3




# Import Pandas package
import pandas as pd
 
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
                   'date': ['2020-08-09', '2020-08-25',
                            '2020-09-05', '2020-09-12',
                            '2020-09-29', '2020-10-15',
                            '2020-11-21', '2020-12-02',
                            '2020-12-10', '2020-12-18']})
 
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
 
# Filter data between two dates
filtered_df = df.loc[(df['date'] >= '2020-09-01')
                     & (df['date'] < '2020-09-15')]
 
# Display
print("\nPosts in December:")
print(filtered_df)
 
# Filter data for specific weekday (tuesday)
filtered_df = df.loc[df['date'].dt.weekday == 2]
 
# Display
print("\nPosts on all Tuesdays:")
print(filtered_df)


Output:

Posts in December:
num_posts date
2 3 2020-09-05
3 9 2020-09-12
Posts on all Tuesdays:
num_posts date
7 5 2020-12-02

Filter DataFrame Rows on Dates Using DataFrame.query()

Filter data based on dates using DataFrame.query() function, The query() function filters a Pandas DataFrame and selects rows by specifying a condition within quotes. As shown below, the condition inside query() is to select the data with dates in the month of August (range of dates is specified). The columns of the DataFrame are placed in the query namespace by default so the date column can be accessed without indexing (simply specify column name).

Python3




# Import Pandas package
import pandas as pd
 
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
                   'date': ['2020-08-09', '2020-08-25',
                            '2020-09-05', '2020-09-12',
                            '2020-09-29', '2020-10-15',
                            '2020-11-21', '2020-12-02',
                            '2020-12-10', '2020-12-18']})
 
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
 
# Filter data between two dates
filtered_df = df.query("date >= '2020-08-01' \
                       and date < '2020-09-01'")
 
# Display
filtered_df


Output:

   num_posts       date
0 4 2020-08-09
1 6 2020-08-25

Filter DataFrame Rows on Dates Using pandas.Series.dt.strftime()

In this example, a sample DataFrame is created with columns ‘num_posts’ and ‘date’. The code then converts the ‘date’ column to datetime format and filters the DataFrame to include rows with dates between ‘2020-08-01’ and ‘2020-09-01’ using the dt.strftime('%Y-%m-%d') method. The filtered DataFrame is then displayed.

Python3




import pandas as pd
 
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
                   'date': ['2020-08-09', '2020-08-25',
                            '2020-09-05', '2020-09-12',
                            '2020-09-29', '2020-10-15',
                            '2020-11-21', '2020-12-02',
                            '2020-12-10', '2020-12-18']})
 
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
 
# Filter data between two dates using dt.strftime()
filtered_df = df[df['date'].dt.strftime(
    '%Y-%m-%d').between('2020-08-01', '2020-09-01')]
 
# Display
print(filtered_df)


Output:

   num_posts       date
0 4 2020-08-09
1 6 2020-08-25


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads