How to Remove Duplicate Rows in R DataFrame?
In this article, we will discuss how to remove duplicate rows in dataframe in R programming language.
Dataset in use:
Method 1: Using distinct()
This method is available in dplyr package which is used to get the unique rows from the dataframe. We can remove rows from the entire which are duplicates and also we cab remove duplicate rows in a particular column.
Syntax:
distinct(dataframe)
distinct(dataframe,column1,column2,.,column n)
Example: R program to remove duplicate rows using distinct() function
R
# load the package library (dplyr) # create dataframe data= data.frame (names= c ( "manoj" , "bobby" , "sravan" , "deepu" , "manoj" , "bobby" ) , id= c (1,2,3,4,1,2), subjects= c ( "java" , "python" , "php" , "html" , "java" , "python" )) # remove all duplicate rows print ( distinct (data)) # remove duplicate rows in subjects column print ( distinct (data,subjects)) # remove duplicate rows in namescolumn print ( distinct (data,names)) |
Output:
Method 2: Using duplicated()
This function will return the duplicates from the dataframe, In order to get the unique rows, we have to specify ! operator before this method
Syntax:
data[!duplicated(data$column_name), ]
where,
- data is the input dataframe
- column_name is the column where duplicates are removed in this column
Example: R program to remove duplicate rows using duplicated() function
R
# create dataframe data= data.frame (names= c ( "manoj" , "bobby" , "sravan" , "deepu" , "manoj" , "bobby" ) , id= c (1,2,3,4,1,2), subjects= c ( "java" , "python" , "php" , "html" , "java" , "python" )) # remove duplicate rows in subjects column print (data[! duplicated (data$subjects), ]) # remove duplicate rows in names column print (data[! duplicated (data$names), ]) # remove duplicate rows in id column print (data[! duplicated (data$id), ]) |
Output:
Method 3 : Using unique()
This will get the unique rows from the dataframe.
Syntax:
unique(dataframe)
To get in a particular column
Syntax:
unique(dataframe$column_name
Example: R program to remove duplicate rows using unique() function
R
# create dataframe data= data.frame (names= c ( "manoj" , "bobby" , "sravan" , "deepu" , "manoj" , "bobby" ) , id= c (1,2,3,4,1,2), subjects= c ( "java" , "python" , "php" , "html" , "java" , "python" )) # remove duplicate rows in subjects column print ( unique (data$subjects)) # remove duplicate rows in names column print ( unique (data$names)) # remove duplicate rows in id column print ( unique (data$id)) |
Output:
[1] "java" "python" "php" "html" [1] "manoj" "bobby" "sravan" "deepu" [1] 1 2 3 4
Example: R program to apply unique() function in entire dataframe
R
# create dataframe data= data.frame (names= c ( "manoj" , "bobby" , "sravan" , "deepu" , "manoj" , "bobby" ) , id= c (1,2,3,4,1,2), subjects= c ( "java" , "python" , "php" , "html" , "java" , "python" )) # remove duplicate rows in entire dataframe print ( unique (data)) |
Output:
Please Login to comment...