Open In App

Remove Duplicate rows in R using Dplyr

Last Updated : 21 Jul, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to remove duplicate rows in R programming language using Dplyr package.

Method 1: distinct()

This function is used to remove the duplicate rows in the dataframe and get the unique data

Syntax:

distinct(dataframe)

We can also remove duplicate rows based on the multiple columns/variables in the dataframe

Syntax:

distinct(dataframe,column1,column2,.,column n)

Dataset in use:

Example 1: R program to remove duplicate rows from the dataframe

R




# load the package
library(dplyr)
  
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
                   
                 name=c('sravan','ojaswi','bobby',
                        'gnanesh','rohith','pinkey',
                        'dhanush','sravan','gnanesh',
                        'ojaswi'),
                   
                 address=c('hyd','hyd','ponnur','tenali',
                           'vijayawada','vijayawada','guntur',
                           'hyd','tenali','hyd'))
  
# remove duplicate rows
print(distinct(data1))


Output:

Example 2: Remove duplicate rows based on single column

R




# load the package
library(dplyr)
  
# create dataframe with three columns 
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
                   
                 name=c('sravan','ojaswi','bobby',
                        'gnanesh','rohith','pinkey',
                        'dhanush','sravan','gnanesh',
                        'ojaswi'),
                   
                 address=c('hyd','hyd','ponnur','tenali',
                           'vijayawada','vijayawada','guntur',
                           'hyd','tenali','hyd'))
  
# remove duplicate rows based on name 
# column
print(distinct(data1,name))


Output:

Example 3: Remove duplicate rows based on multiple columns

R




# load the package
library(dplyr)
  
# create dataframe with three columns 
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
                   
                 name=c('sravan','ojaswi','bobby',
                        'gnanesh','rohith','pinkey',
                        'dhanush','sravan','gnanesh',
                        'ojaswi'),
                   
                 address=c('hyd','hyd','ponnur','tenali',
                           'vijayawada','vijayawada','guntur',
                           'hyd','tenali','hyd'))
  
# remove duplicate rows based on 
# name and address columns
print(distinct(data1,address,name))


Output:

Method 2: using duplicated() function 

duplicated() function will return the duplicated rows and !duplicated() function will return the unique rows.

Syntax:

dataframe[!duplicated(dataframe$column_name), ]

Here, dataframe is the input dataframe and column_name is the column in dataframe, based on that column the duplicate data is removed.

Example: R program to remove duplicate data based on particular column

R




# load the package
library(dplyr)
  
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
                   
                 name=c('sravan','ojaswi','bobby',
                        'gnanesh','rohith','pinkey',
                        'dhanush','sravan','gnanesh',
                        'ojaswi'),
                   
                 address=c('hyd','hyd','ponnur','tenali',
                           'vijayawada','vijayawada','guntur',
                           'hyd','tenali','hyd'))
  
# remove duplicate rows using duplicated()
# function based on name column
print(data1[!duplicated(data1$name), ] )
print("=====================")
  
# remove duplicate rows using duplicated()
# function based on id column
print(data1[!duplicated(data1$id), ] )
print("=====================")
  
# remove duplicate rows using duplicated()
# function based on address column
print(data1[!duplicated(data1$address), ] )
print("=====================")


Output:

Method 3: Using unique() function

unique() function is used to remove duplicate rows by returning the unique data

Syntax:

unique(dataframe)

To get unique data from column pass the name of the column along with the name of the dataframe,

Syntax:

unique(dataframe$column_name)

Where, dataframe is the input dataframe and column_name is the column in the dataframe.

Example 1: R program to remove duplicates using unique() function

R




# load the package
library(dplyr)
  
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
                   
                 name=c('sravan','ojaswi','bobby',
                        'gnanesh','rohith','pinkey',
                        'dhanush','sravan','gnanesh',
                        'ojaswi'),
                   
                 address=c('hyd','hyd','ponnur','tenali',
                           'vijayawada','vijayawada','guntur',
                           'hyd','tenali','hyd'))
  
# get unique data from the dataframe
print(unique(data1))


Output:

Example 2: R program to remove duplicate in particular column

R




# load the package
library(dplyr)
  
# create dataframe with three columns
# named id,name and address
data1=data.frame(id=c(1,2,3,4,5,6,7,1,4,2),
                   
                 name=c('sravan','ojaswi','bobby',
                        'gnanesh','rohith','pinkey',
                        'dhanush','sravan','gnanesh',
                        'ojaswi'),
                   
                 address=c('hyd','hyd','ponnur','tenali',
                           'vijayawada','vijayawada','guntur',
                           'hyd','tenali','hyd'))
  
# get unique data from the dataframe
# in id column
print(unique(data1$id))
  
# get unique data from the dataframe 
# in name  column
print(unique(data1$name))
  
# get unique data from the dataframe 
# in address column
print(unique(data1$address))


Output:



Similar Reads

Remove duplicate rows based on multiple columns using Dplyr in R
In this article, we will learn how to remove duplicate rows based on multiple columns using dplyr in R programming language. Dataframe in use: lang value usage 1 Java 21 21 2 C 21 21 3 Python 3 0 4 GO 5 99 5 RUST 180 44 6 Javascript 9 48 7 Cpp 12 53 8 Java 21 21 9 Julia 6 6 10 Typescript 0 8 11 Python 3 0 12 GO 6 6Removing duplicate rows based on t
4 min read
Remove Rows with NA Using dplyr Package in R
NA means Not Available is often used for missing values in a dataset. In Machine Learning NA values are a common problem and if not treated properly can create severe issues during data analysis. NA is also referred as NaN which means Not a number. To understand NA values we can think of an admission form with different columns including Blood Grou
5 min read
Filter or subsetting rows in R using Dplyr
In this article, we are going to filter the rows from dataframe in R programming language using Dplyr package. Dataframe in use: Method 1: Subset or filter a row using filter() To filter or subset row we are going to use the filter() function. Syntax: filter(dataframe,condition) Here, dataframe is the input dataframe, and condition is used to filte
6 min read
Sum Across Multiple Rows and Columns Using dplyr Package in R
In this article, we are going to see how to sum multiple Rows and columns using Dplyr Package in R Programming language. The dplyr package is used to perform simulations in the data by performing manipulations and transformations. It can be installed into the working space using the following command : install.packages("dplyr")Calculating row sums
3 min read
How to Remove a Column using Dplyr package in R
In this article, we are going to remove a column(s) in the R programming language using dplyr library. Dataset in use: Remove column using column nameHere we will use select() method to select and remove column by its name. Syntax: select(dataframe,-column_name) Here, dataframe is the input dataframe and column_name is the column in the dataframe t
3 min read
How to Remove a Column by name and index using Dplyr Package in R
In this article, we are going to remove columns by name and index in the R programming language using dplyr package. Dataset in use: Remove a column by using column name We can remove a column with select() method by its column name. Syntax: select(dataframe,-column_name) Where, dataframe is the input dataframe and column_name is the name of the co
2 min read
How to Remove Duplicate Rows in R DataFrame?
In this article, we will discuss how to remove duplicate rows in dataframe in R programming language. Dataset in use:Method 1: Using distinct() This method is available in dplyr package which is used to get the unique rows from the dataframe. We can remove rows from the entire which are duplicates and also we cab remove duplicate rows in a particul
2 min read
How to remove NA values with dplyr filter
In this article, we will examine various methods to remove NA values with dplyr filter by using R Programming Language. Remove NA values with the dplyr filterR language offers various methods to remove NA values with dplyr filter efficiently. By using these methods provided by R, it is possible to remove NA values easily. Some of the methods to rem
3 min read
Group by one or more variables using Dplyr in R
The group_by() method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names. Syntax: group_by(col1, col2, ...) Example 1: Group by one variable C/C++ Code # installing required libraries library(
2 min read
Single-Table Analysis with dplyr using R Language
The dplyr package is used to perform simulations in the data by performing manipulations and transformations. It can be installed into the working space using the following command : install.packages("dplyr") Let's create the main dataframe: C/C++ Code #installing the required libraries library(dplyr) #creating a data frame data_frame = data.frame(
5 min read
Article Tags :