Python | Pandas dataframe.drop_duplicates()
Syntax of df.drop_duplicates()
Syntax: DataFrame.drop_duplicates(subset=None, keep=’first’, inplace=False)
- subset: Subset takes a column or list of column label. It’s default value is none. After passing columns, it will consider them only for duplicates.
- keep: keep is to control how to consider duplicate value. It has only three distinct value and default is ‘first’.
- If ‘first‘, it considers first value as unique and rest of the same values as duplicate.
- If ‘last‘, it considers last value as unique and rest of the same values as duplicate.
- If False, it consider all of the same values as duplicates
- inplace: Boolean values, removes rows with duplicates if True.
Return type: DataFrame with removed duplicate rows depending on Arguments passed.
As we can see one of the TeamA and team has been dropped due to duplicate value.
A B C 0 TeamA 50 True 1 TeamB 40 False 3 TeamC 30 False
To download the CSV file used, Click Here.
Example 1: Removing rows with the same First Name
In the following example, rows having the same First Name are removed and a new data frame is returned.
As shown in the image, the rows with the same names were removed from a data frame.
Example 2: Removing rows with all duplicate values
In this example, rows having all values will be removed. Since the CSV file isn’t having such a row, a random row is duplicated and inserted into the data frame first.
As shown in the output image, the length after removing duplicates is 999. Since the keep parameter was set to False, all of the duplicate rows were removed.