Python | Pandas Index.duplicated()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Pandas Index.duplicated()
function returns Index object with the duplicate values remove. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.
Syntax: Index.duplicated(keep=’first’)
Parameters :
keep : {‘first’, ‘last’, False}, default ‘first’
The value or values in a set of duplicates to mark as missing.
-> ‘first’ : Mark duplicates as True except for the first occurrence.
-> ‘last’ : Mark duplicates as True except for the last occurrence.
-> False : Mark all duplicates as True.
Returns : numpy.ndarray
Example #1: Use Index.duplicated()
function to indicate all the duplicated value in the Index except the first one.
import pandas as pd
idx = pd.Index([ 'Labrador' , 'Beagle' , 'Labrador' ,
'Lhasa' , 'Husky' , 'Beagle' ])
idx
|
Output :
Let’s find if a value present in Index is a duplicate value or unique.
idx.duplicated(keep = 'first' )
|
Output :
As we can see in the output, the Index.duplicated()
function has marked all the occurrence of duplicate value as True
except the first occurrence.
Example #2: Use Index.duplicated()
function to identify all the duplicate values. here all the duplicate values will be marked as True
import pandas as pd
idx = pd.Index([ 100 , 50 , 45 , 100 , 12 , 50 , None ])
idx
|
Output :
Let’s identify all the duplicated values in the Index.
Note : We are having NaN
values in the Index.
idx.duplicated(keep = False )
|
Output :
The function has marked all the duplicate value as True. It has also treated the single occurrence of NaN
value as unique and has marked it false.
Last Updated :
14 Nov, 2022
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...