Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
An important part of Data analysis is analyzing Duplicate Values and removing them. Pandas
duplicated() method helps in analyzing duplicate values only. It returns a boolean series which is True only for Unique elements.
subset: Takes a column or list of column label. It’s default value is none. After passing columns, it will consider them only for duplicates.
keep: Controls how to consider duplicate value. It has only three distinct value and default is ‘first’.
–> If ‘first’, it considers first value as unique and rest of the same values as duplicate.
–> If ‘last’, it considers last value as unique and rest of the same values as duplicate.
–> If False, it consider all of the same values as duplicates.
To download the CSV file used, Click Here.
Example #1: Returning a boolean series
In the following example, a boolean series is returned on the basis of duplicate values in the First Name column.
As shown in the output image, since the keep parameter was default that is ‘first’, hence whenever the name is occured, the first one is considered Unique and res Duplicate.
Example #2: Removing duplicates
In this example, the keep parameter is set to False, so that only Unique values are taken and the duplicate values are removed from data.
Since the duplicated() method returns False for Duplicates, the NOT of the series is taken to see unique value in Data Frame.
- Python | pandas.to_markdown() in Pandas
- Add a Pandas series to another Pandas series
- Python | Pandas Index.insert()
- Python | Pandas DatetimeIndex.inferred_freq
- Python | Pandas PeriodIndex.start_time
- Python | Pandas PeriodIndex.week
- Python | Pandas Timestamp.second
- Python | Pandas Series.asobject
- Python | Pandas str.join() to join string/list elements with passed delimiter
- Python | Pandas DataFrame.reset_index()
- Python | Pandas dataframe.notna()
- Python | Pandas PeriodIndex.weekday
- Python | Pandas Series.dt.floor
- Python | Pandas Index.get_slice_bound()
- Python | Pandas dataframe.notnull()
- Python | Pandas series.cumprod() to find Cumulative product of a Series
- Use Pandas to Calculate Statistics in Python
- Python | Pandas Timestamp.date
- Python | Pandas Timestamp.ctime
- Python | Pandas dataframe.round()
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.