Count Unique Values by Group in R

Last Updated : 05 Apr, 2021

In the article, we are going to discuss how to count the number of unique values by the group in R Programming Language. So let’s take the following example,

Suppose you have a dataset with multiple columns like this:

	class	age	age_group
1	A	20	YOUNG
2	B	15	KID
3	C	45	OLD
4	B	14	KID
5	A	21	YOUNG
6	A	22	YOUNG
7	C	47	OLD
8	A	19	YOUNG
9	B	16	KID
10	C	50	OLD
11	A	23	YOUNG

In this dummy dataset class, age, age_group represent column names and our task is to count the number of unique values by age_group.

So, that the resultant dataset should look like this:

	age_group	unique_count
1	YOUNG	5
2	KID	3
3	OLD	3

Method 1: Using aggregate function

Using aggregate function we can perform operation on multiple rows (by grouping the data) and produce a single summary value.

Example:

R

# Count Unique values by group 
  
# Creating dataset  
# creating class column 
x <- c("A","B","C","B","A","A","C","A","B","C","A") 
  
# creating age column 
y <- c(20,15,45,14,21,22,47,18,16,50,23) 
  
# creating age_group column 
z <- c("YOUNG","KID","OLD","KID","YOUNG","YOUNG", 
      "OLD","YOUNG","KID","OLD","YOUNG") 
  
# creating dataframe 
df <- data.frame(class=x,age=y,age_group=z) 
df 
  
# applying aggregate function 
aggregate( age~age_group,df, function(x) length(unique(x)))

Output:

Output 1.

Method 2: Using dplyr package and group_by function

“dplyr“ is the most widely used R package. It is mainly used for data wrangling purpose. It provides set of tools for data manipulation.

Example:

R

# Count Unique values by group 
  
# loading dplyr 
library("dplyr") 
  
# Creating dataset  
# creating class column 
x <- c("A","B","C","B","A","A","C","A","B","C","A") 
  
# creating age column 
y <- c(20,15,45,14,21,22,47,18,16,50,23) 
  
# creating age_group column 
z <- c("YOUNG","KID","OLD","KID","YOUNG","YOUNG", 
      "OLD","YOUNG","KID","OLD","YOUNG") 
  
# creating dataframe 
df <- data.frame(class=x,age=y,age_group=z) 
  
# grouping age_group column  
# counting all the unique 
# value based on the age_group  
# column  
df %>% 
  group_by(age_group) %>% 
  summarise(n_distinct(age))