Skip to content
Related Articles

Related Articles

Improve Article

Dplyr – Groupby on multiple columns using variable names in R

  • Last Updated : 23 Sep, 2021

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call. The group_by() function takes as an argument, the across and all of the methods which has to be applied on the specified grouping over all the columns of the data frame. 

Syntax:

group_by(col1, col2,..)

This is followed by the application of summarize() function, which is used to generate summary statistics over the applied column. The column is renamed with the specified new name. The new column can be assigned any of the aggregate methods like mean(), sum(), etc. Let us first look at a simpler approach, and apply groupby to only one column.

Example: Groupby on single column using variable names

R






library(data.table)
library(dplyr)
  
# creating first data frame
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2),
                         col2 = c(1,1,3,4,5,6),
                         col3 = 1
                        )
  
print ("Original DataFrame")
print (data_frame)
  
# deciding the column to group by
grp <- c('col1')
  
# calculating mean of col2 based on col1 group
data_frame %>% 
  group_by(across(all_of(grp))) %>% 
  summarize(mean_col2 = mean(col2))

Output

[1] "Original DataFrame" 
col1 col2 col3 
1:    A    1    1 
2:    A    1    1 
3:    B    3    1 
4:    B    4    1 
5:    C    5    1 
6:    C    6    1 
# A tibble: 3 x 2   
col1  mean_col2   
<chr>     <dbl> 
1 A           1   
2 B           3.5 
3 C           5.5

Since there are three groups, A, B, and C, the mean is calculated for each of these three groups. 

Example: Applying group_by over multiple columns using the variable name

R




library(data.table)
library(dplyr)
  
# creating first data frame
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2),
                         col2 = c(1,1,3,4,5,6),
                         col3 = 1
                        )
  
print ("Original DataFrame")
print (data_frame)
  
# deciding the column to group by
grp <- c('col1','col2')
  
# calculating mean of col2 based on col1 group
data_frame %>% 
  group_by(across(all_of(grp))) %>% 
  summarize(mean_col2 = sum(col2))

Output

# A tibble: 5 x 3 
# Groups:   col1 [3]   
col1   col2 mean_col2   
<chr> <dbl>     <dbl> 
1 A         1         2 
2 B         3         3 
3 B         4         4 
4 C         5         5 
5 C         6         6



My Personal Notes arrow_drop_up
Recommended Articles
Page :