Open In App

Count Unique Values by Group in R

Last Updated : 05 Apr, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In the article, we are going to discuss how to count the number of unique values by the group in R Programming Language. So let’s take the following example,

Suppose you have a dataset with multiple columns like this:

 

class

age

age_group

1

A

20

YOUNG

2

B

15

KID

3

C

45

OLD

4

B

14

KID

5

A

21

YOUNG

6

A

22

YOUNG

7

C

47

OLD

8

A

19

YOUNG

9

B

16

KID

10

C

50

OLD

11

A

23

YOUNG

In this dummy dataset class, age, age_group represent column names and our task is to count the number of unique values by age_group.

So, that the resultant dataset should look like this:

 

age_group

unique_count

1

YOUNG

5

2

KID

3

3

OLD

3

 

Method 1: Using aggregate function

Using aggregate function we can perform operation on multiple rows (by grouping the data) and produce a single summary value.

Example:

R




# Count Unique values by group
  
# Creating dataset 
# creating class column
x <- c("A","B","C","B","A","A","C","A","B","C","A")
  
# creating age column
y <- c(20,15,45,14,21,22,47,18,16,50,23)
  
# creating age_group column
z <- c("YOUNG","KID","OLD","KID","YOUNG","YOUNG",
      "OLD","YOUNG","KID","OLD","YOUNG")
  
# creating dataframe
df <- data.frame(class=x,age=y,age_group=z)
df
  
# applying aggregate function
aggregate( age~age_group,df, function(x) length(unique(x)))


Output:

Output 1.

Method 2: Using dplyr package and group_by function

dplyr is the most widely used R package. It is mainly used for data wrangling purpose. It provides set of tools for data manipulation.

Example:

R




# Count Unique values by group
  
# loading dplyr
library("dplyr")
  
# Creating dataset 
# creating class column
x <- c("A","B","C","B","A","A","C","A","B","C","A")
  
# creating age column
y <- c(20,15,45,14,21,22,47,18,16,50,23)
  
# creating age_group column
z <- c("YOUNG","KID","OLD","KID","YOUNG","YOUNG",
      "OLD","YOUNG","KID","OLD","YOUNG")
  
# creating dataframe
df <- data.frame(class=x,age=y,age_group=z)
  
# grouping age_group column 
# counting all the unique
# value based on the age_group 
# column 
df %>%
  group_by(age_group) %>%
  summarise(n_distinct(age))


Output:

Output 2.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads