Open In App

Group data.table by Multiple Columns in R

Last Updated : 23 Sep, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to group data.table by multiple columns in R programming language.

The package data.table can be used to work with data tables and subsetting and organizing data. It can be downloaded and installed into the workspace using the following command : 

library(data.table)

The column at a specified index can be extracted using the list subsetting, i.e. [, operator. The new column can be added in the second argument assigned to a predefined or a user-defined function defined over a set of columns of data.table. The by argument can be added to group the data using a set of columns from the data table. The list() method can be used to specify a set of columns of the data.table to group the data by. 

Example: Group data.table by multiple columns

R




library(data.table)
  
# creating first data frame
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2),
                  col2 = c(5:10),
                  col3 = c(TRUE,FALSE)
                  )
  
print ("Original DataFrame")
print (data_frame)
  
# group by col1,col3
data_mod <- data_frame[ , count:=sum(col2), 
                       by = list(col1,col3)]  
  
# print modified data frame
print ("Modified DataFrame"
print(data_mod)                                        


Output:

[1] "Original DataFrame"
   col1 col2  col3 
1:    A    5  TRUE 
2:    A    6 FALSE 
3:    B    7  TRUE 
4:    B    8 FALSE 
5:    C    9  TRUE 
6:    C   10 FALSE 
[1] "Modified DataFrame" 
   col1 col2  col3 count 
1:    A    5  TRUE     5 
2:    A    6 FALSE     6 
3:    B    7  TRUE     7 
4:    B    8 FALSE     8 
5:    C    9  TRUE     9 
6:    C   10 FALSE    10

In the above example, since none of the groups are the same, therefore, the new column “count” values are equivalent to the col2 values. 

In case there are columns belonging to the same groups, the sum is generated corresponding to each column.

Example:  Group data.table by multiple columns

R




library(data.table)
  
# creating first data frame
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2),
                         col2 = c(1:6),
                         col3 = TRUE
                        )
  
print ("Original DataFrame")
print (data_frame)
  
# group by col1,col3
data_mod <- data_frame[ , count:=sum(col2), 
                       by = list(col1,col3)]  
  
# print modified data frame
print ("Modified DataFrame"
print(data_mod)


Output:

[1] "Original DataFrame" 
   col1 col2 col3 
1:    A    1 TRUE 
2:    A    2 TRUE 
3:    B    3 TRUE 
4:    B    4 TRUE 
5:    C    5 TRUE 
6:    C    6 TRUE 
[1] "Modified DataFrame" 
   col1 col2 col3 count 
1:    A    1 TRUE     3 
2:    A    2 TRUE     3 
3:    B    3 TRUE     7 
4:    B    4 TRUE     7 
5:    C    5 TRUE    11 
6:    C    6 TRUE    11

Grouping of data can also be done using all the columns of the data.table, as indicated in the following code snippet.

Example: Group data.table by multiple columns

R




library(data.table)
  
# creating first data frame
data_frame <- data.table(col1 = rep(LETTERS[1:3],each=2),
                         col2 = c(1,1,3,4,5,6),
                         col3 = 1
                        )
print ("Original DataFrame")
print (data_frame)
  
# group by col1,col3
data_mod <- data_frame[ , count:=sum(col2), 
                       by = list(col1,col2,col3)]  
  
# print modified data frame
print ("Modified DataFrame"
print(data_mod)


Output:

[1] "Original DataFrame" 
col1 col2 col3 
1:    A    1    1 
2:    A    1    1 
3:    B    3    1 
4:    B    4    1 
5:    C    5    1 
6:    C    6    1 
[1] "Modified DataFrame" 
col1 col2 col3 count 
1:    A    1    1     1 
2:    A    1    1     1 
3:    B    3    1     3 
4:    B    4    1     4 
5:    C    5    1     5 
6:    C    6    1     6


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads