Open In App
Related Articles

Count Unique Values by Group in R

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Report issue
Report

In the article, we are going to discuss how to count the number of unique values by the group in R Programming Language. So let’s take the following example,

Suppose you have a dataset with multiple columns like this:

 

class

age

age_group

1

A

20

YOUNG

2

B

15

KID

3

C

45

OLD

4

B

14

KID

5

A

21

YOUNG

6

A

22

YOUNG

7

C

47

OLD

8

A

19

YOUNG

9

B

16

KID

10

C

50

OLD

11

A

23

YOUNG

In this dummy dataset class, age, age_group represent column names and our task is to count the number of unique values by age_group.

So, that the resultant dataset should look like this:

 

age_group

unique_count

1

YOUNG

5

2

KID

3

3

OLD

3

 

Method 1: Using aggregate function

Using aggregate function we can perform operation on multiple rows (by grouping the data) and produce a single summary value.

Example:

R

# Count Unique values by group
  
# Creating dataset 
# creating class column
x <- c("A","B","C","B","A","A","C","A","B","C","A")
  
# creating age column
y <- c(20,15,45,14,21,22,47,18,16,50,23)
  
# creating age_group column
z <- c("YOUNG","KID","OLD","KID","YOUNG","YOUNG",
      "OLD","YOUNG","KID","OLD","YOUNG")
  
# creating dataframe
df <- data.frame(class=x,age=y,age_group=z)
df
  
# applying aggregate function
aggregate( age~age_group,df, function(x) length(unique(x)))

                    

Output:

Output 1.

Method 2: Using dplyr package and group_by function

dplyr is the most widely used R package. It is mainly used for data wrangling purpose. It provides set of tools for data manipulation.

Example:

R

# Count Unique values by group
  
# loading dplyr
library("dplyr")
  
# Creating dataset 
# creating class column
x <- c("A","B","C","B","A","A","C","A","B","C","A")
  
# creating age column
y <- c(20,15,45,14,21,22,47,18,16,50,23)
  
# creating age_group column
z <- c("YOUNG","KID","OLD","KID","YOUNG","YOUNG",
      "OLD","YOUNG","KID","OLD","YOUNG")
  
# creating dataframe
df <- data.frame(class=x,age=y,age_group=z)
  
# grouping age_group column 
# counting all the unique
# value based on the age_group 
# column 
df %>%
  group_by(age_group) %>%
  summarise(n_distinct(age))

                    

Output:

Output 2.



Last Updated : 05 Apr, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads