In this article, we will be discussing about how to find duplicate rows in a Dataframe based on all or a list of columns. For this we will use
Dataframe.duplicated() method of Pandas.
Syntax : DataFrame.duplicated(subset = None, keep = ‘first’)
subset: This Takes a column or list of column label. It’s default value is None. After passing columns, it will consider them only for duplicates.
keep: This Controls how to consider duplicate value. It has only three distinct value and default is ‘first’.
- If ‘first’, This considers first value as unique and rest of the same values as duplicate.
- If ‘last’, This considers last value as unique and rest of the same values as duplicate.
- If ‘False’, This considers all of the same values as duplicates.
Returns: Boolean Series denoting duplicate rows.
Let’s create a simple dataframe with a dictionary of lists, say column names are: ‘Name’, ‘Age’ and ‘City’.
Example 1 : Select duplicate rows based on all columns.
Here, We do not pass any argument therefore it takes default values for both the arguments i.e. subset = None and keep = ‘first’.
Example 2 : Select duplicate rows based on all columns.
If you want to consider all duplicates except the last one then pass keep = ‘last’ as an argument.
Example 3 : If you want to select duplicate rows based only on some selected columns then pass the list of column names in subset as an argument.
Example 4 : Select duplicate rows based on more than one column names.
- Apply a function to single or selected columns or rows in Pandas Dataframe
- Sort rows or columns in Pandas Dataframe based on values
- Find maximum values & position in columns and rows of a Dataframe in Pandas
- How to Find & Drop duplicate columns in a Pandas DataFrame?
- Python | Delete rows/columns from DataFrame using Pandas.drop()
- Dealing with Rows and Columns in Pandas DataFrame
- Iterating over rows and columns in Pandas DataFrame
- Drop rows from Pandas dataframe with missing values or NaN in columns
- Get the number of rows and number of columns in Pandas Dataframe
- How to create an empty DataFrame and append rows & columns to it in Pandas?
- Get minimum values in rows or columns with their index position in Pandas-Dataframe
- Select Rows & Columns by Name or Index in Pandas DataFrame using [ ], loc & iloc
- Count the number of rows and columns of a Pandas dataframe
- Count the number of rows and columns of Pandas dataframe
- Create a new column in Pandas DataFrame based on the existing columns
- Selecting rows in pandas DataFrame based on conditions
- Drop rows from the dataframe based on certain condition applied on a column
- Find the number of rows and columns of a given matrix using NumPy
- Python Counter| Find duplicate rows in a binary matrix
- Loop or Iterate over all or certain columns of a dataframe in Python-Pandas
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.