Open In App

Dplyr – Groupby on multiple columns using variable names in R

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call. The group_by() function takes as an argument, the across and all of the methods which has to be applied on the specified grouping over all the columns of the data frame. 

Syntax:

group_by(col1, col2,..)

This is followed by the application of summarize() function, which is used to generate summary statistics over the applied column. The column is renamed with the specified new name. The new column can be assigned any of the aggregate methods like mean(), sum(), etc. Let us first look at a simpler approach, and apply groupby to only one column.

Example: Groupby on single column using variable names

R




library(data.table)
library(dplyr)
  
# creating first data frame
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2),
                         col2 = c(1,1,3,4,5,6),
                         col3 = 1
                        )
  
print ("Original DataFrame")
print (data_frame)
  
# deciding the column to group by
grp <- c('col1')
  
# calculating mean of col2 based on col1 group
data_frame %>% 
  group_by(across(all_of(grp))) %>% 
  summarize(mean_col2 = mean(col2))


Output

[1] "Original DataFrame" 
col1 col2 col3 
1:    A    1    1 
2:    A    1    1 
3:    B    3    1 
4:    B    4    1 
5:    C    5    1 
6:    C    6    1 
# A tibble: 3 x 2   
col1  mean_col2   
<chr>     <dbl> 
1 A           1   
2 B           3.5 
3 C           5.5

Since there are three groups, A, B, and C, the mean is calculated for each of these three groups. 

Example: Applying group_by over multiple columns using the variable name

R




library(data.table)
library(dplyr)
  
# creating first data frame
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2),
                         col2 = c(1,1,3,4,5,6),
                         col3 = 1
                        )
  
print ("Original DataFrame")
print (data_frame)
  
# deciding the column to group by
grp <- c('col1','col2')
  
# calculating mean of col2 based on col1 group
data_frame %>% 
  group_by(across(all_of(grp))) %>% 
  summarize(mean_col2 = sum(col2))


Output

# A tibble: 5 x 3 
# Groups:   col1 [3]   
col1   col2 mean_col2   
<chr> <dbl>     <dbl> 
1 A         1         2 
2 B         3         3 
3 B         4         4 
4 C         5         5 
5 C         6         6


Last Updated : 23 Sep, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads