Open In App

How to find duplicate values in a list in R

In this article, we will see how to find duplicate values in a list in the R Programming Language in different scenarios.

Finding duplicate values in a List

In R, the duplicated() function is used to find the duplicate values present in the R objects. This function determines which elements of a List are duplicates and returns a logical vector (Holds TRUE/FALSE values) indicating which elements are duplicates. TRUE is returned if the element already exists. Otherwise, FALSE will be returned.

Syntax:

duplicated(List_name)

Here, List_name is the input list.

Let's have a list with 10 values and find the duplicate values.

# Create a List
List_data =list(1,2,3,4,5,6,7,5,4,3)
print(List_data)

# Find duplicates in the above List
print(duplicated(List_data))

Output:

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6

[[7]]
[1] 7

[[8]]
[1] 5

[[9]]
[1] 4

[[10]]
[1] 3

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE

We can see that last three elements in the List are duplicated. So TRUE is returned for them.

Let's have a list that hold 2 lists and find duplicates in each of the list separately.

# Create a List with 2 lists
List_data =list(list1=list(100,200,300,300,300),
                list2=list("Java","HTML","PHP","JSP","Statistics"))
print(List_data)

# Find duplicates in list1 from List_data
print(duplicated(List_data$list1))

# Find duplicates in list2 from List_data
print(duplicated(List_data$list2))

Output:

$list1
$list1[[1]]
[1] 100

$list1[[2]]
[1] 200

$list1[[3]]
[1] 300

$list1[[4]]
[1] 300

$list1[[5]]
[1] 300


$list2
$list2[[1]]
[1] "Java"

$list2[[2]]
[1] "HTML"

$list2[[3]]
[1] "PHP"

$list2[[4]]
[1] "JSP"

$list2[[5]]
[1] "Statistics"

[1] FALSE FALSE FALSE TRUE TRUE

[1] FALSE FALSE FALSE FALSE FALSE

There are two duplicates in list1.

Let's create a List having three vectors and find the duplicates in each vector.

# Create a List with 3 vectors
List_data =list(Id=c(1,2,3,4,5,4,5),Subject=c("Java","HTML","HTML","Python"),
                Marks=c(100,89,78,69,80))
print(List_data)

# Find duplicates in the Id
duplicated(List_data$Id)

# Find duplicates in the Subject
duplicated(List_data$Subject)

# Find duplicates in the Marks
duplicated(List_data$Marks)

Output:

$Id
[1] 1 2 3 4 5 4 5

$Subject
[1] "Java" "HTML" "HTML" "Python"

$Marks
[1] 100 89 78 69 80

[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE

[1] FALSE FALSE TRUE FALSE

[1] FALSE FALSE FALSE FALSE FALSE
  1. Id holds two duplicate values i.e 4 and 5
  2. Subject holds one duplicate value i.e "HTML"
  3. There are no duplicates in the Marks vector.

Let's create a List having 2 vectors and return total number of duplicate elements. To do this we need to use the sum() function and pass the duplicated() function as a parameter to it.

# Create a List with 2 vectors
List_data =list(Id=c(1,2,3,4,5,4,5),Subject=c("Java","HTML","HTML","Python"))
print(List_data)

# Find duplicates in the Id
sum(duplicated(List_data$Id))

# Find duplicates in the Subject
sum(duplicated(List_data$Subject))

Output:

$Id
[1] 1 2 3 4 5 4 5

$Subject
[1] "Java" "HTML" "HTML" "Python"

[1] 2

[1] 1

There are 2 duplicates in the Id vector and one duplicate in the Subject vector.

Conclusion

In conclusion, identifying duplicate values in a list in R is essential for data cleaning and quality assurance. By utilizing various methods such as the duplicated() function we can efficiently detect and handle duplicate values.

Article Tags :