Count Unique Values by Group in R
In the article, we are going to discuss how to count the number of unique values by the group in R Programming Language. So let’s take the following example,
Suppose you have a dataset with multiple columns like this: class age age_group 1 A 20 YOUNG 2 B 15 KID 3 C 45 OLD 4 B 14 KID 5 A 21 YOUNG 6 A 22 YOUNG 7 C 47 OLD 8 A 19 YOUNG 9 B 16 KID 10 C 50 OLD 11 A 23 YOUNG
In this dummy dataset class, age, age_group represent column names and our task is to count the number of unique values by age_group.
So, that the resultant dataset should look like this: age_group unique_count 1 YOUNG 5 2 KID 3 3 OLD 3
Method 1: Using aggregate function
Using aggregate function we can perform operation on multiple rows (by grouping the data) and produce a single summary value.
Example:
R
# Count Unique values by group # Creating dataset # creating class column x <- c ( "A" , "B" , "C" , "B" , "A" , "A" , "C" , "A" , "B" , "C" , "A" ) # creating age column y <- c (20,15,45,14,21,22,47,18,16,50,23) # creating age_group column z <- c ( "YOUNG" , "KID" , "OLD" , "KID" , "YOUNG" , "YOUNG" , "OLD" , "YOUNG" , "KID" , "OLD" , "YOUNG" ) # creating dataframe df <- data.frame (class=x,age=y,age_group=z) df # applying aggregate function aggregate ( age~age_group,df, function (x) length ( unique (x))) |
Output:

Output 1.
Method 2: Using dplyr package and group_by function
“dplyr“ is the most widely used R package. It is mainly used for data wrangling purpose. It provides set of tools for data manipulation.
Example:
R
# Count Unique values by group # loading dplyr library ( "dplyr" ) # Creating dataset # creating class column x <- c ( "A" , "B" , "C" , "B" , "A" , "A" , "C" , "A" , "B" , "C" , "A" ) # creating age column y <- c (20,15,45,14,21,22,47,18,16,50,23) # creating age_group column z <- c ( "YOUNG" , "KID" , "OLD" , "KID" , "YOUNG" , "YOUNG" , "OLD" , "YOUNG" , "KID" , "OLD" , "YOUNG" ) # creating dataframe df <- data.frame (class=x,age=y,age_group=z) # grouping age_group column # counting all the unique # value based on the age_group # column df %>% group_by (age_group) %>% summarise ( n_distinct (age)) |
Output:

Output 2.