Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Group by one or more variables using Dplyr in R

  • Last Updated : 23 Aug, 2021

The group_by() method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names.

Syntax:

group_by(col1, col2, …)

Example 1: Group by one variable

R






# installing required libraries
library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))
  
print ("Original DataFrame")
print (data_frame)
  
print ("Modified DataFrame")
  
# comouting difference of each group
data_frame%>%group_by(col1)

Output

[1] "Original DataFrame" 
col1 col2 col3 
1    6    a    1 
2    7    b    4 
3    7    c    5 
4    6    a    1 
5    7    b   NA 
6    6    c   NA 
7    6    a    2 
8    6    b   NA 
9    7    c    2 
[1] "Modified DataFrame" 
# A tibble: 9 x 3 
# Groups:   col1 [2]    
col1 col2   col3   
<int> <chr> <dbl> 
1     6 a         1 
2     7 b         4 
3     7 c         5 
4     6 a         1 
5     7 b        NA 
6     6 c        NA 
7     6 a         2 
8     6 b        NA 
9     7 c         2

Grouping can be also done using multiple columns belonging to the data frame for this just the names of the columns have to be passed to the function.

Example 2: Group by multiple columns

R




# installing required libraries
library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))
  
print ("Original DataFrame")
print (data_frame)
  
print ("Modified DataFrame")
  
# comouting difference of each group
data_frame%>%group_by(col1,col2)

Output

[1] "Original DataFrame" 
col1 col2 col3
 1    7    a    1 
2    7    b    4 
3    7    c    5 
4    6    a    1 
5    6    b   NA 
6    6    c   NA 
7    7    a    2 
8    6    b   NA 
9    6    c    2 
[1] "Modified DataFrame" 
# A tibble: 9 x 3 
# Groups:   col1, col2 [6]    
col1 col2   col3   
<int> <chr> <dbl> 
1     7 a         1 
2     7 b         4 
3     7 c         5 
4     6 a         1 
5     6 b        NA 
6     6 c        NA 
7     7 a         2 
8     6 b        NA 
9     6 c         2



My Personal Notes arrow_drop_up
Recommended Articles
Page :