Skip to content
Related Articles

Related Articles

Improve Article

Convert dataframe to data.table in R

  • Last Updated : 17 May, 2021

In this article, we will discuss how to convert dataframe to data.table in R Programming Language. data.table is an R package that provides an enhanced version of dataframe. Characteristics of data.table : 

  • data.table doesn’t set or use row names
  • row numbers are printed with a : for better readability
  • Unlike dataframes, columns of character type are never converted to factors by default in data.table.

Method 1 : Using setDT() method

While dataframes are available using the base R packages, data.table object is a part of the data.table package, which needs to be installed in the working space. The setDT() method can be used to coerce the dataframe or the lists into data.table, where the conversion is made to the original dataframe. The modification is made by reference to the original data structure. 

Syntax: setDT(x)

Arguments : 



  • x : A named or unnamed list, data.frame or data.table.

Example 1:

R




# using the required library
library(data.table)
  
# declare a dataframe
data_frame <- data.frame(col1 = c(1:7),
                         col2 = LETTERS[1:7],
                         col3 = letters[1:7])
  
print ("Original DataFrame")
print (data_frame)
  
# converting into data.table
setDT(data_frame)
print ("Resultant DataFrame")
print (data_frame)

Output

[1] "Original DataFrame"
> print (data_frame)
    col1 col2 col3
1    1    A    a
2    2    B    b
3    3    C    c
4    4    D    d
5    5    E    e
6    6    F    f
7    7    G    g
[1] "Resultant DataFrame"
> print (data_frame)
     col1 col2 col3
1:    1    A    a
2:    2    B    b
3:    3    C    c
4:    4    D    d
5:    5    E    e
6:    6    F    f
7:    7    G    g

All the missing and NA values stored in a dataframe are preserved in data.table as well. The row names are reassigned to identifiers beginning with integer values starting from 1 till the number of rows in the dataframe. The library data.table also provides other functions to verify if the R object is a data.table using is.data.table(data_frame). It returns true if the specified argument is data.table else false. 

Example 2:

R




# using the required library
library(data.table)
  
# declare a dataframe
data_frame <- data.frame(col1 = c(1, NA, 4, NA, 3, NA),       
                         col2 = c("a", NA, "b", "e", "f", "G"),
                         row.names = c("row1","row2","row3",
                                       "row4","row5","row6"))
  
print ("Original DataFrame")
print (data_frame)
  
# converting into data.table
setDT(data_frame)
print ("Resultant DataFrame")
print (data_frame)
  
# checking if the dataframe is data table
print ("Check if data table")
print (is.data.table(data_frame))

Output

[1] "Original DataFrame"
       col1 col2
row1    1    a
row2   NA <NA>
row3    4    b
row4   NA    e
row5    3    f
row6   NA    G
[1] "Resultant DataFrame"
     col1 col2
1:   1    a
2:   NA <NA>
3:   4    b
4:  NA    e
5:   3    f
6:   NA    G
[1] "Check if data table"
[1] TRUE

Explanation: The original dataframe is stored as a data.frame object and then using the setDT method the same dataframe is returned with row numbers appended at the beginning, with the row number identifier followed by a colon. The missing values, that is NA are returned as it is. Since the changes are made to the dataframe, when we check whether it is a data table or not using is.data.table(), it returns logical TRUE value.



Method 2 : Using as.data.table() method

The as.data.table() method can be used to coerce the dataframe or the lists into data.table if the specified object is not originally a data.table, and the conversion is possible. The changes are not made to the original dataframe, therefore, it creates a copy of the base object. 

Syntax: as.data.table(x,keep.rownames=FALSE)

Arguments :

  • x : A named or unnamed list, data.frame or data.table.
  • keep.rownames :  By default: False. For data.frames, TRUE retains the data.frame’s row names under a new column rn. keep.rownames = “id” names the column “id” instead.

Example:

R




# using the required library
library(data.table)
  
# declare a dataframe
data_frame <- data.frame(col1 = c(1, NA, 4, NA, 3, NA),       
                         col2 = c("a", NA, "b", "e", "f", "G"),
                         row.names = c("row1","row2","row3",
                                       "row4","row5","row6"))
  
print ("Original DataFrame")
print (data_frame)
  
# converting into data.table
dt <- as.data.table(data_frame, TRUE)
print ("Resultant DataFrame")
print (dt)
print ("Check if data table")
print (is.data.table(dt))

Output

[1] "Original DataFrame" 
> print (data_frame)      
           col1 col2 
   row1    1    a 
   row2   NA <NA> 
   row3    4    b 
   row4   NA    e 
   row5    3    f 
   row6   NA    G 
[1] "Resultant DataFrame"
       rn col1 col2 
1: row1    1    a 
2: row2   NA <NA> 
3: row3    4    b 
4: row4   NA    e 
5: row5    3    f 
6: row6   NA    G 
[1] "Check if data table" 
[1] TRUE

Explanation: The original dataframe has a row name for each of the rows. When the dataframe is converted to data table, the row names form a separate column “rn” and also each row is lead by a row number identifier followed by colon. However, the changes are not made to the original dataframe. So, when we apply the is.data.table() method to the original dataframe, it returns FALSE. On the contrary, if we apply this method to the result of the as.data.table() method, we get TRUE value.




My Personal Notes arrow_drop_up
Recommended Articles
Page :