Group data.table by Multiple Columns in R
Last Updated :
23 Sep, 2021
In this article, we will discuss how to group data.table by multiple columns in R programming language.
The package data.table can be used to work with data tables and subsetting and organizing data. It can be downloaded and installed into the workspace using the following command :
library(data.table)
The column at a specified index can be extracted using the list subsetting, i.e. [, operator. The new column can be added in the second argument assigned to a predefined or a user-defined function defined over a set of columns of data.table. The by argument can be added to group the data using a set of columns from the data table. The list() method can be used to specify a set of columns of the data.table to group the data by.
Example: Group data.table by multiple columns
R
library (data.table)
data_frame <- data.table (col1 = rep ( LETTERS [1:3],each=2),
col2 = c (5:10),
col3 = c ( TRUE , FALSE )
)
print ( "Original DataFrame" )
print (data_frame)
data_mod <- data_frame[ , count:= sum (col2),
by = list (col1,col3)]
print ( "Modified DataFrame" )
print (data_mod)
|
Output:
[1] "Original DataFrame"
col1 col2 col3
1: A 5 TRUE
2: A 6 FALSE
3: B 7 TRUE
4: B 8 FALSE
5: C 9 TRUE
6: C 10 FALSE
[1] "Modified DataFrame"
col1 col2 col3 count
1: A 5 TRUE 5
2: A 6 FALSE 6
3: B 7 TRUE 7
4: B 8 FALSE 8
5: C 9 TRUE 9
6: C 10 FALSE 10
In the above example, since none of the groups are the same, therefore, the new column “count” values are equivalent to the col2 values.
In case there are columns belonging to the same groups, the sum is generated corresponding to each column.
Example: Group data.table by multiple columns
R
library (data.table)
data_frame <- data.table (col1 = rep ( LETTERS [1:3],each=2),
col2 = c (1:6),
col3 = TRUE
)
print ( "Original DataFrame" )
print (data_frame)
data_mod <- data_frame[ , count:= sum (col2),
by = list (col1,col3)]
print ( "Modified DataFrame" )
print (data_mod)
|
Output:
[1] "Original DataFrame"
col1 col2 col3
1: A 1 TRUE
2: A 2 TRUE
3: B 3 TRUE
4: B 4 TRUE
5: C 5 TRUE
6: C 6 TRUE
[1] "Modified DataFrame"
col1 col2 col3 count
1: A 1 TRUE 3
2: A 2 TRUE 3
3: B 3 TRUE 7
4: B 4 TRUE 7
5: C 5 TRUE 11
6: C 6 TRUE 11
Grouping of data can also be done using all the columns of the data.table, as indicated in the following code snippet.
Example: Group data.table by multiple columns
R
library (data.table)
data_frame <- data.table (col1 = rep ( LETTERS [1:3],each=2),
col2 = c (1,1,3,4,5,6),
col3 = 1
)
print ( "Original DataFrame" )
print (data_frame)
data_mod <- data_frame[ , count:= sum (col2),
by = list (col1,col2,col3)]
print ( "Modified DataFrame" )
print (data_mod)
|
Output:
[1] "Original DataFrame"
col1 col2 col3
1: A 1 1
2: A 1 1
3: B 3 1
4: B 4 1
5: C 5 1
6: C 6 1
[1] "Modified DataFrame"
col1 col2 col3 count
1: A 1 1 1
2: A 1 1 1
3: B 3 1 3
4: B 4 1 4
5: C 5 1 5
6: C 6 1 6
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...