How To Remove Duplicates From Vector In R

A vector is a basic data structure that is used to represent an ordered collection of elements of the same data type. It is one-dimensional and can contain numeric, character, or logical values. It is to be noted that the vector in C++ and the vector in R Programming Language are not the same. In C++, a vector is a dynamic array that can grow or shrink in size but in the case of R, a vector is a fundamental data structure itself.

In R language, a vector is initialized using the c() function which stands for “combine” or “concatenate”. A vector

Initializing Numerical Vector

# Initializing a Numeric vector
 
numericVector <- c(1,2,3,4,5,6)
 
# Printing the numericVector.
 
cat("numericalVector: " , numericVector, "\n")

Output:

numericalVector:  1 2 3 4 5 6

Initializing Character Vector

# Initializing a Character Vector
 
characterVector <- c("Anakin", "Luke", "Ezra", "order66")
 
# Printing the characterVector.

cat("characterVector: " , characterVector, "\n")

Output:

characterVector:  Anakin Luke Ezra order66

Initializing Logical Vector

# Initializing a Logical vector
 
logicalVector <- c(TRUE, FALSE, FALSE,FALSE, FALSE, TRUE, FALSE);
 
#Printing the logicalVector.

cat("logicalVector: " , logicalVector, "\n")

Output:

logicalVector:  TRUE FALSE FALSE FALSE FALSE TRUE FALSE

Removing the duplicates from vector

In R, the unique function is commonly used to eliminate duplicate values from a vector. In the context of data science, where R is frequently employed, it is crucial to ensure that the data being analyzed is of high quality and makes sense. Dealing with large volumes of data necessitates a focus on obtaining meaningful information.

Removing duplicates from a vector is a fundamental step during data cleaning and Exploratory Data Analysis (EDA). This process helps enhance the quality of the data by eliminating redundant or repeated values. The benefits of removing duplicates include obtaining consistent and reliable results, as well as avoiding unnecessary repetition in the dataset. In essence, this practice contributes to the overall reliability and usefulness of the data being analyzed in R for data science purposes.

Unique() Function

R Language provides unique() function which can be used to remove duplicates from the vector.

Using unique() on numerical vector

# Creating a vector with duplicates.
 
myVector <- c(1,2,2,3,4,5,6,6,5,7,9)
 
# Using the unique() function to remove duplicates.
 
uniqueVector <- unique(myVector)
 
#Print the result
 
print(uniqueVector)

Output:

[1] 1 2 3 4 5 6 7 9

Using unique() function on character vector

# Creating duplicated character vector.

duplicatedCharVec <- c("Anakin" , "anakin", "Anakin", "Luke", "Ashoka")
 
# Using unique() function to remove duplicates from duplicatedCharVec.

CharVec <- unique(duplicatedCharVec)
 
# Printing the CharVec.

print(CharVec)

Output:

[1] "Anakin" "anakin" "Luke"   "Ashoka"

Duplicated() Function with indexing

The duplicated() function takes a vector as input and returns a logical vector of the same length, indicating whether each element is a duplicate (i.e has occurred previously in the vector).

Using duplicated() function along with indexing on numerical vector

# Example vector with duplicates.

myVector <- c(1,2,2,2,3,3,3,4,3,4,3,6)
 
# Removing duplicates using duplicated() and indexing.

uniqueVector <- myVector[!duplicated(myVector)]
 
# Printing the uniqueVector

print(uniqueVector)

Output:

[1] 1 2 3 4 6

Using duplicated() function along with indexing on character vector

# Example vector with duplicates.

myVector <- c("Anakin", "Luke","Anakin","Ezra","Darth Vader","Obi-Wan")
 
# Removing duplicates using duplicated() and indexing.

uniqueVector <- myVector[!duplicated(myVector)]
 
# Printing the uniqueVector

print(uniqueVector)

Output:

[1] "Anakin"      "Luke"        "Ezra"        "Darth Vader" "Obi-Wan"

Using `dplyr` Package

dplyr Package of R Language is used for data manipulation tasks, making code more readable and efficient.
Various key functions provided by dplyr Package are as follows
filter() : Filter rows based on specified conditions.
select() : Select specific columns.
mutate() : Add new variable or modify existing ones.
arrange() : Reorder rows based on variable values.
group_by() : Group data by one or more variables.
summarize() : Summarize data, typically using aggregate functions.
distinct() : Get distinct (unique) rows.

For further reference on dplyr Package in R follow : dplyr Package in R Programming

Using distinct() function of dplyr Package to remove duplicate values from numerical vector

# Install and load the dplyr package if not already installed
# install.packages("dplyr")

library(dplyr)
 
# Example vector with duplicates

myVector <- c(1,2,2,3,3,2,1,5,6)
 
# Remove duplicates using distinct() from dplyr

uniqueVector <- distinct(data.frame(value = myVector))$value
 
# Printing the value of uniqueVector

print(uniqueVector)

Output:

[1] 1 2 3 5 6

Using distinct() function of dplyr Package to remove duplicate values from character Vector

# Install and load the dplyr package if not already installed
# install.packages("dplyr")

library(dplyr)
 
# Example vector with duplicates

myVector <- c("Anakin","Ezra","Luke","Anakin")
 
# Remove duplicates using distinct() from dplyr

uniqueVector <- distinct(data.frame(value = myVector))$value
 
# Printing the value of uniqueVector

print(uniqueVector)

Output:

[1] "Anakin" "Ezra"   "Luke"

Article Tags :

Geeks Premier League

R Language

Geeks Premier League 2023