Remove Duplicate rows in R using Dplyr
Last Updated :
21 Jul, 2021
In this article, we are going to remove duplicate rows in R programming language using Dplyr package.
Method 1: distinct()
This function is used to remove the duplicate rows in the dataframe and get the unique data
Syntax:
distinct(dataframe)
We can also remove duplicate rows based on the multiple columns/variables in the dataframe
Syntax:
distinct(dataframe,column1,column2,.,column n)
Dataset in use:
Example 1: R program to remove duplicate rows from the dataframe
R
library (dplyr)
data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' ,
'gnanesh' , 'rohith' , 'pinkey' ,
'dhanush' , 'sravan' , 'gnanesh' ,
'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
print ( distinct (data1))
|
Output:
Example 2: Remove duplicate rows based on single column
R
library (dplyr)
data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' ,
'gnanesh' , 'rohith' , 'pinkey' ,
'dhanush' , 'sravan' , 'gnanesh' ,
'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
print ( distinct (data1,name))
|
Output:
Example 3: Remove duplicate rows based on multiple columns
R
library (dplyr)
data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' ,
'gnanesh' , 'rohith' , 'pinkey' ,
'dhanush' , 'sravan' , 'gnanesh' ,
'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
print ( distinct (data1,address,name))
|
Output:
Method 2: using duplicated() function
duplicated() function will return the duplicated rows and !duplicated() function will return the unique rows.
Syntax:
dataframe[!duplicated(dataframe$column_name), ]
Here, dataframe is the input dataframe and column_name is the column in dataframe, based on that column the duplicate data is removed.
Example: R program to remove duplicate data based on particular column
R
library (dplyr)
data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' ,
'gnanesh' , 'rohith' , 'pinkey' ,
'dhanush' , 'sravan' , 'gnanesh' ,
'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
print (data1[! duplicated (data1$name), ] )
print ( "=====================" )
print (data1[! duplicated (data1$id), ] )
print ( "=====================" )
print (data1[! duplicated (data1$address), ] )
print ( "=====================" )
|
Output:
Method 3: Using unique() function
unique() function is used to remove duplicate rows by returning the unique data
Syntax:
unique(dataframe)
To get unique data from column pass the name of the column along with the name of the dataframe,
Syntax:
unique(dataframe$column_name)
Where, dataframe is the input dataframe and column_name is the column in the dataframe.
Example 1: R program to remove duplicates using unique() function
R
library (dplyr)
data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' ,
'gnanesh' , 'rohith' , 'pinkey' ,
'dhanush' , 'sravan' , 'gnanesh' ,
'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
print ( unique (data1))
|
Output:
Example 2: R program to remove duplicate in particular column
R
library (dplyr)
data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' ,
'gnanesh' , 'rohith' , 'pinkey' ,
'dhanush' , 'sravan' , 'gnanesh' ,
'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
print ( unique (data1$id))
print ( unique (data1$name))
print ( unique (data1$address))
|
Output:
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...