Pandas DataFrame duplicated() Method | Pandas Method

Last Updated : 02 Feb, 2024

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas duplicated() method identifies duplicated rows in a DataFrame. It returns a boolean series which is True only for unique rows.

Example:

Python3

import pandas as pd 
df = pd.DataFrame({ 
    'Name': ['Alice', 'Bob', 'Alice', 'Charlie'], 
    'Age': [25, 32, 25, 37] 
}) 
duplicates = df[df.duplicated()] 
print(duplicates)

Output:

Name  Age
2  Alice   25

Syntax

Syntax: DataFrame.duplicated(subset=None, keep=’first’)

Parameters:

subset: Takes a column or list of column label. It’s default value is none. After passing columns, it will consider them only for duplicates.

keep: Controls how to consider duplicate value. It has only three distinct values and the default is ‘first’.
–> If ‘first‘, it considers the first value as unique and the rest of the same values as duplicate.
–> If ‘last‘, it considers the last value as unique and the rest of the same values as duplicate.
–> If False, it considers all of the same values as duplicates.

Returns: A series with boolean values for each row in the DataFrame

To download the CSV file used, Click Here.

Examples

Let’s look at some examples of the duplicated method in Pandas library used to identify duplicated rows in a DataFrame.

Example 1: Returning a boolean series

In the following example, a boolean series is returned based on duplicate values in the First Name column.

Python

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("employees.csv") 
  
# sorting by first name 
data.sort_values("First Name", inplace = True) 
  
# making a bool series 
bool_series = data["First Name"].duplicated() 
  
# displaying data 
data.head() 
  
# display data 
data[bool_series] 

Output:
As shown in the output image, since the keep parameter was a default that is ‘first‘, hence whenever the name occurs, the first one is considered Unique, and the rest Duplicate.

Example 2: Removing duplicates

In this example, the keep parameter is set to False, so that only Unique values are taken and the duplicate values are removed from DataFrame.

Python

# importing pandas package 
import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("employees.csv") 
  
# sorting by first name 
data.sort_values("First Name", inplace = True) 
  
# making a bool series 
bool_series = data["First Name"].duplicated(keep = False) 
  
# bool series 
bool_series 
  
# passing NOT of bool series to see unique values only 
data = data[~bool_series] 
  
# displaying data 
data.info() 
data 

Output:
Since the duplicated() method returns False for duplicates, the NOT of the series is taken to see the unique values in the DataFrame.

removing duplicate value with duplicated output

Suggest improvement

Python | Pandas dataframe.cov()

Python | Pandas dataframe.drop_duplicates()

Share your thoughts in the comments

Pandas DataFrame duplicated() Method | Pandas Method

Python3

Syntax

Examples

Example 1: Returning a boolean series

Python

Example 2: Removing duplicates

Python

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?