Dplyr – Groupby on multiple columns using variable names in R
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call. The group_by() function takes as an argument, the across and all of the methods which has to be applied on the specified grouping over all the columns of the data frame.
Syntax:
group_by(col1, col2,..)
This is followed by the application of summarize() function, which is used to generate summary statistics over the applied column. The column is renamed with the specified new name. The new column can be assigned any of the aggregate methods like mean(), sum(), etc. Let us first look at a simpler approach, and apply groupby to only one column.
Example: Groupby on single column using variable names
R
library (data.table)
library (dplyr)
data_frame <- data.table (col1 = rep ( LETTERS [1:3],each=2),
col2 = c (1,1,3,4,5,6),
col3 = 1
)
print ( "Original DataFrame" )
print (data_frame)
grp <- c ( 'col1' )
data_frame %>%
group_by ( across ( all_of (grp))) %>%
summarize (mean_col2 = mean (col2))
|
Output
[1] "Original DataFrame"
col1 col2 col3
1: A 1 1
2: A 1 1
3: B 3 1
4: B 4 1
5: C 5 1
6: C 6 1
# A tibble: 3 x 2
col1 mean_col2
<chr> <dbl>
1 A 1
2 B 3.5
3 C 5.5
Since there are three groups, A, B, and C, the mean is calculated for each of these three groups.
Example: Applying group_by over multiple columns using the variable name
R
library (data.table)
library (dplyr)
data_frame <- data.table (col1 = rep ( LETTERS [1:3],each=2),
col2 = c (1,1,3,4,5,6),
col3 = 1
)
print ( "Original DataFrame" )
print (data_frame)
grp <- c ( 'col1' , 'col2' )
data_frame %>%
group_by ( across ( all_of (grp))) %>%
summarize (mean_col2 = sum (col2))
|
Output
# A tibble: 5 x 3
# Groups: col1 [3]
col1 col2 mean_col2
<chr> <dbl> <dbl>
1 A 1 2
2 B 3 3
3 B 4 4
4 C 5 5
5 C 6 6
Last Updated :
23 Sep, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...