Open In App
Related Articles

Group by one or more variables using Dplyr in R

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Report issue
Report

The group_by() method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names.

Syntax:

group_by(col1, col2, …)

Example 1: Group by one variable

R

# installing required libraries
library("dplyr")
 
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))
 
print ("Original DataFrame")
print (data_frame)
 
print ("Modified DataFrame")
 
# computing difference of each group
data_frame%>%group_by(col1)

                    

Output

[1] "Original DataFrame" 
col1 col2 col3 
1    6    a    1 
2    7    b    4 
3    7    c    5 
4    6    a    1 
5    7    b   NA 
6    6    c   NA 
7    6    a    2 
8    6    b   NA 
9    7    c    2 
[1] "Modified DataFrame" 
# A tibble: 9 x 3 
# Groups:   col1 [2]    
col1 col2   col3   
<int> <chr> <dbl> 
1     6 a         1 
2     7 b         4 
3     7 c         5 
4     6 a         1 
5     7 b        NA 
6     6 c        NA 
7     6 a         2 
8     6 b        NA 
9     7 c         2

Grouping can be also done using multiple columns belonging to the data frame for this just the names of the columns have to be passed to the function.

Example 2: Group by multiple columns

R

# installing required libraries
library("dplyr")
 
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))
 
print ("Original DataFrame")
print (data_frame)
 
print ("Modified DataFrame")
 
# computing difference of each group
data_frame%>%group_by(col1,col2)

                    

Output

[1] "Original DataFrame" 
col1 col2 col3
 1    7    a    1 
2    7    b    4 
3    7    c    5 
4    6    a    1 
5    6    b   NA 
6    6    c   NA 
7    7    a    2 
8    6    b   NA 
9    6    c    2 
[1] "Modified DataFrame" 
# A tibble: 9 x 3 
# Groups:   col1, col2 [6]    
col1 col2   col3   
<int> <chr> <dbl> 
1     7 a         1 
2     7 b         4 
3     7 c         5 
4     6 a         1 
5     6 b        NA 
6     6 c        NA 
7     7 a         2 
8     6 b        NA 
9     6 c         2


Last Updated : 16 Dec, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads