Group data.table by Multiple Columns in R

Last Updated : 23 Sep, 2021

In this article, we will discuss how to group data.table by multiple columns in R programming language.

The package data.table can be used to work with data tables and subsetting and organizing data. It can be downloaded and installed into the workspace using the following command :

library(data.table)

The column at a specified index can be extracted using the list subsetting, i.e. [, operator. The new column can be added in the second argument assigned to a predefined or a user-defined function defined over a set of columns of data.table. The by argument can be added to group the data using a set of columns from the data table. The list() method can be used to specify a set of columns of the data.table to group the data by.

Example: Group data.table by multiple columns

R

library(data.table) 
  
# creating first data frame 
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2), 
                  col2 = c(5:10), 
                  col3 = c(TRUE,FALSE) 
                  ) 
  
print ("Original DataFrame") 
print (data_frame) 
  
# group by col1,col3 
data_mod <- data_frame[ , count:=sum(col2),  
                       by = list(col1,col3)]   
  
# print modified data frame 
print ("Modified DataFrame")  
print(data_mod)                                         

Output:

[1] "Original DataFrame"
   col1 col2  col3 
1:    A    5  TRUE 
2:    A    6 FALSE 
3:    B    7  TRUE 
4:    B    8 FALSE 
5:    C    9  TRUE 
6:    C   10 FALSE 
[1] "Modified DataFrame" 
   col1 col2  col3 count 
1:    A    5  TRUE     5 
2:    A    6 FALSE     6 
3:    B    7  TRUE     7 
4:    B    8 FALSE     8 
5:    C    9  TRUE     9 
6:    C   10 FALSE    10

In the above example, since none of the groups are the same, therefore, the new column “count” values are equivalent to the col2 values.

In case there are columns belonging to the same groups, the sum is generated corresponding to each column.

Example: Group data.table by multiple columns

R

library(data.table) 
  
# creating first data frame 
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2), 
                         col2 = c(1:6), 
                         col3 = TRUE
                        ) 
  
print ("Original DataFrame") 
print (data_frame) 
  
# group by col1,col3 
data_mod <- data_frame[ , count:=sum(col2),  
                       by = list(col1,col3)]   
  
# print modified data frame 
print ("Modified DataFrame")  
print(data_mod)

Output:

[1] "Original DataFrame" 
   col1 col2 col3 
1:    A    1 TRUE 
2:    A    2 TRUE 
3:    B    3 TRUE 
4:    B    4 TRUE 
5:    C    5 TRUE 
6:    C    6 TRUE 
[1] "Modified DataFrame" 
   col1 col2 col3 count 
1:    A    1 TRUE     3 
2:    A    2 TRUE     3 
3:    B    3 TRUE     7 
4:    B    4 TRUE     7 
5:    C    5 TRUE    11 
6:    C    6 TRUE    11

Grouping of data can also be done using all the columns of the data.table, as indicated in the following code snippet.

Example: Group data.table by multiple columns

R

library(data.table) 
  
# creating first data frame 
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2), 
                         col2 = c(1,1,3,4,5,6), 
                         col3 = 1 
                        ) 
print ("Original DataFrame") 
print (data_frame) 
  
# group by col1,col3 
data_mod <- data_frame[ , count:=sum(col2),  
                       by = list(col1,col2,col3)]   
  
# print modified data frame 
print ("Modified DataFrame")  
print(data_mod)

Output:

[1] "Original DataFrame" 
col1 col2 col3 
1:    A    1    1 
2:    A    1    1 
3:    B    3    1 
4:    B    4    1 
5:    C    5    1 
6:    C    6    1 
[1] "Modified DataFrame" 
col1 col2 col3 count 
1:    A    1    1     1 
2:    A    1    1     1 
3:    B    3    1     3 
4:    B    4    1     4 
5:    C    5    1     5 
6:    C    6    1     6

Suggest improvement

Summarize Multiple Columns of data.table by Group in R

Share your thoughts in the comments