How To Remove Duplicates From Vector In R

Last Updated : 29 Jan, 2024

A vector is a basic data structure that is used to represent an ordered collection of elements of the same data type. It is one-dimensional and can contain numeric, character, or logical values. It is to be noted that the vector in C++ and the vector in R Programming Language are not the same. In C++, a vector is a dynamic array that can grow or shrink in size but in the case of R, a vector is a fundamental data structure itself.

In R language, a vector is initialized using the c() function which stands for “combine” or “concatenate”. A vector

Initializing Numerical Vector

R

# Initializing a Numeric vector
 
numericVector <- c(1,2,3,4,5,6)
 
# Printing the numericVector.
 
cat("numericalVector: " , numericVector, "\n")

Output:

numericalVector:  1 2 3 4 5 6

Initializing Character Vector

R

# Initializing a Character Vector
 
characterVector <- c("Anakin", "Luke", "Ezra", "order66")
 
# Printing the characterVector.
cat("characterVector: " , characterVector, "\n")

Output:

characterVector:  Anakin Luke Ezra order66

Initializing Logical Vector

R

# Initializing a Logical vector
 
logicalVector <- c(TRUE, FALSE, FALSE,FALSE, FALSE, TRUE, FALSE);
 
#Printing the logicalVector.
cat("logicalVector: " , logicalVector, "\n")

Output:

logicalVector:  TRUE FALSE FALSE FALSE FALSE TRUE FALSE

Removing the duplicates from vector

In R, the unique function is commonly used to eliminate duplicate values from a vector. In the context of data science, where R is frequently employed, it is crucial to ensure that the data being analyzed is of high quality and makes sense. Dealing with large volumes of data necessitates a focus on obtaining meaningful information.

Removing duplicates from a vector is a fundamental step during data cleaning and Exploratory Data Analysis (EDA). This process helps enhance the quality of the data by eliminating redundant or repeated values. The benefits of removing duplicates include obtaining consistent and reliable results, as well as avoiding unnecessary repetition in the dataset. In essence, this practice contributes to the overall reliability and usefulness of the data being analyzed in R for data science purposes.

Unique() Function

R Language provides unique() function which can be used to remove duplicates from the vector.

Using unique() on numerical vector

R

# Creating a vector with duplicates.
 
myVector <- c(1,2,2,3,4,5,6,6,5,7,9)
 
# Using the unique() function to remove duplicates.
 
uniqueVector <- unique(myVector)
 
#Print the result
 
print(uniqueVector)

Output:

[1] 1 2 3 4 5 6 7 9

Using unique() function on character vector

R

# Creating duplicated character vector.
duplicatedCharVec <- c("Anakin" , "anakin", "Anakin", "Luke", "Ashoka")
 
# Using unique() function to remove duplicates from duplicatedCharVec.
CharVec <- unique(duplicatedCharVec)
 
# Printing the CharVec.
print(CharVec)

Output:

[1] "Anakin" "anakin" "Luke"   "Ashoka"

Duplicated() Function with indexing

The duplicated() function takes a vector as input and returns a logical vector of the same length, indicating whether each element is a duplicate (i.e has occurred previously in the vector).

Using duplicated() function along with indexing on numerical vector

R

# Example vector with duplicates.
myVector <- c(1,2,2,2,3,3,3,4,3,4,3,6)
 
# Removing duplicates using duplicated() and indexing.
uniqueVector <- myVector[!duplicated(myVector)]
 
# Printing the uniqueVector
print(uniqueVector)

Output:

[1] 1 2 3 4 6

Using duplicated() function along with indexing on character vector

R

# Example vector with duplicates.
myVector <- c("Anakin", "Luke","Anakin","Ezra","Darth Vader","Obi-Wan")
 
# Removing duplicates using duplicated() and indexing.
uniqueVector <- myVector[!duplicated(myVector)]
 
# Printing the uniqueVector
print(uniqueVector)

Output:

[1] "Anakin"      "Luke"        "Ezra"        "Darth Vader" "Obi-Wan"

Using `dplyr` Package

dplyr Package of R Language is used for data manipulation tasks, making code more readable and efficient.
Various key functions provided by dplyr Package are as follows
filter() : Filter rows based on specified conditions.
select() : Select specific columns.
mutate() : Add new variable or modify existing ones.
arrange() : Reorder rows based on variable values.
group_by() : Group data by one or more variables.
summarize() : Summarize data, typically using aggregate functions.
distinct() : Get distinct (unique) rows.

For further reference on dplyr Package in R follow : dplyr Package in R Programming

Using distinct() function of dplyr Package to remove duplicate values from numerical vector

R

# Install and load the dplyr package if not already installed
# install.packages("dplyr")
library(dplyr)
 
# Example vector with duplicates
myVector <- c(1,2,2,3,3,2,1,5,6)
 
# Remove duplicates using distinct() from dplyr
uniqueVector <- distinct(data.frame(value = myVector))$value
 
# Printing the value of uniqueVector
print(uniqueVector)

Output:

[1] 1 2 3 5 6

Using distinct() function of dplyr Package to remove duplicate values from character Vector

R

# Install and load the dplyr package if not already installed
# install.packages("dplyr")
library(dplyr)
 
# Example vector with duplicates
myVector <- c("Anakin","Ezra","Luke","Anakin")
 
# Remove duplicates using distinct() from dplyr
uniqueVector <- distinct(data.frame(value = myVector))$value
 
# Printing the value of uniqueVector
print(uniqueVector)

Output:

[1] "Anakin" "Ezra"   "Luke"

Suggest improvement

How to Remove Duplicates from a Vector in C++?

Share your thoughts in the comments

How To Remove Duplicates From Vector In R

Initializing Numerical Vector

R

Initializing Character Vector

R

Initializing Logical Vector

R

Removing the duplicates from vector

Unique() Function

Using unique() on numerical vector

R

Using unique() function on character vector

R

Duplicated() Function with indexing

Using duplicated() function along with indexing on numerical vector

R

Using duplicated() function along with indexing on character vector

R

Using `dplyr` Package

Using distinct() function of dplyr Package to remove duplicate values from numerical vector

R

Using distinct() function of dplyr Package to remove duplicate values from character Vector

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?