Dplyr – Groupby on multiple columns using variable names in R

Last Updated : 23 Sep, 2021

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call. The group_by() function takes as an argument, the across and all of the methods which has to be applied on the specified grouping over all the columns of the data frame.

Syntax:

group_by(col1, col2,..)

This is followed by the application of summarize() function, which is used to generate summary statistics over the applied column. The column is renamed with the specified new name. The new column can be assigned any of the aggregate methods like mean(), sum(), etc. Let us first look at a simpler approach, and apply groupby to only one column.

Example: Groupby on single column using variable names

R

library(data.table) 
library(dplyr) 
  
# creating first data frame 
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2), 
                         col2 = c(1,1,3,4,5,6), 
                         col3 = 1 
                        ) 
  
print ("Original DataFrame") 
print (data_frame) 
  
# deciding the column to group by 
grp <- c('col1') 
  
# calculating mean of col2 based on col1 group 
data_frame %>%  
  group_by(across(all_of(grp))) %>%  
  summarize(mean_col2 = mean(col2))

Output

[1] "Original DataFrame" 
col1 col2 col3 
1:    A    1    1 
2:    A    1    1 
3:    B    3    1 
4:    B    4    1 
5:    C    5    1 
6:    C    6    1 
# A tibble: 3 x 2   
col1  mean_col2   
<chr>     <dbl> 
1 A           1   
2 B           3.5 
3 C           5.5

Since there are three groups, A, B, and C, the mean is calculated for each of these three groups.

Example: Applying group_by over multiple columns using the variable name

R

library(data.table) 
library(dplyr) 
  
# creating first data frame 
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2), 
                         col2 = c(1,1,3,4,5,6), 
                         col3 = 1 
                        ) 
  
print ("Original DataFrame") 
print (data_frame) 
  
# deciding the column to group by 
grp <- c('col1','col2') 
  
# calculating mean of col2 based on col1 group 
data_frame %>%  
  group_by(across(all_of(grp))) %>%  
  summarize(mean_col2 = sum(col2))

Output

# A tibble: 5 x 3 
# Groups:   col1 [3]   
col1   col2 mean_col2   
<chr> <dbl>     <dbl> 
1 A         1         2 
2 B         3         3 
3 B         4         4 
4 C         5         5 
5 C         6         6

Suggest improvement

Drop multiple columns using Dplyr package in R

Share your thoughts in the comments

Dplyr – Groupby on multiple columns using variable names in R

R

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?