Group by one or more variables using Dplyr in R
Last Updated :
16 Dec, 2021
The group_by() method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names.
Syntax:
group_by(col1, col2, …)
Example 1: Group by one variable
R
library ( "dplyr" )
data_frame <- data.frame (col1 = sample (6:7, 9 , replace = TRUE ),
col2 = letters [1:3],
col3 = c (1,4,5,1, NA , NA ,2, NA ,2))
print ( "Original DataFrame" )
print (data_frame)
print ( "Modified DataFrame" )
data_frame%>% group_by (col1)
|
Output
[1] "Original DataFrame"
col1 col2 col3
1 6 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 7 b NA
6 6 c NA
7 6 a 2
8 6 b NA
9 7 c 2
[1] "Modified DataFrame"
# A tibble: 9 x 3
# Groups: col1 [2]
col1 col2 col3
<int> <chr> <dbl>
1 6 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 7 b NA
6 6 c NA
7 6 a 2
8 6 b NA
9 7 c 2
Grouping can be also done using multiple columns belonging to the data frame for this just the names of the columns have to be passed to the function.
Example 2: Group by multiple columns
R
library ( "dplyr" )
data_frame <- data.frame (col1 = sample (6:7, 9 , replace = TRUE ),
col2 = letters [1:3],
col3 = c (1,4,5,1, NA , NA ,2, NA ,2))
print ( "Original DataFrame" )
print (data_frame)
print ( "Modified DataFrame" )
data_frame%>% group_by (col1,col2)
|
Output
[1] "Original DataFrame"
col1 col2 col3
1 7 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 6 b NA
6 6 c NA
7 7 a 2
8 6 b NA
9 6 c 2
[1] "Modified DataFrame"
# A tibble: 9 x 3
# Groups: col1, col2 [6]
col1 col2 col3
<int> <chr> <dbl>
1 7 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 6 b NA
6 6 c NA
7 7 a 2
8 6 b NA
9 6 c 2
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...