Open In App

Convert Column Classes of Data Table in R

Last Updated : 26 May, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

The data.table package is used to ease the data manipulation operations such as sub-setting, grouping, and updating operations of the data table in R Programming Language. 

sapply() method in R programming language is used to apply a function over the specified R object, a data frame, or a matrix. If we specify FUN = “class”, the data type of each of the columns of the data table is returned. 

Syntax:

sapply ( data-table , FUN)

The data type of the particular column in R language can be changed to the required class, by explicit conversion. The result has to be stored in a different variable, in order to preserve it. 

Syntax:

data.table [ , col-name := conv-func(col-name) ]

In this syntax, conv-func illustrates the explicit conversion function to be applied to the particular column. For instance, it is as.character() for character conversion, as.numeric() for numeric conversion and as.factor() for factor-type variable conversion.

Example:

R




library(data.table)
  
# creating a data frame
data_table <- data.table(col1 = c(1:5),
                         col2 = letters[1:5],
                         col3 = factor(sample(5:10)))
  
print ("Original DataTable")
print (data_table)
  
# getting class of columns
sapply(data_table , class)
  
# convert column into character
data_table_mod <- data_table[ , col3 := as.character(col3)]
  
print ("Modified DataTable")
print (data_table_mod)
sapply(data_table_mod , class)


Output

[1] "Original DataTable" 
   col1 col2 col3 
1:    1    a    5 
2:    2    b    8 
3:    3    c    7 
4:    4    d    6 
5:    5    e    9 
6:    1    a   10 
col1        col2        col3    
"integer" "character"    "factor"  
[1] "Modified DataTable" 
   col1 col2 col3 
1:    1    a    5 
2:    2    b    8
3:    3    c    7 
4:    4    d    6 
5:    5    e    9 
6:    1    a   10 
col1        col2        col3    
"integer" "character" "character" 

However, the conversion of character factor to numeric can be simulated only if the particular column is convertible to numerical form. In the following code, when col2 of the data table is converted to integral format using as.numeric(), data is lost and replaced by missing values.

Example:

R




library(data.table)
  
# creating a data frame
data_table <- data.table(col1 = c(1:5),
                         col2 = letters[1:5],
                         col3 = factor(sample(5:10)))
  
print ("Original DataTable")
print (data_table)
  
# getting class of columns
sapply(data_table , class)
  
# convert column into character
data_table_mod <- data_table[ , col2 := as.numeric(col2)]
  
print ("Modified DataTable")
print (data_table_mod)
sapply(data_table_mod , class)


Output

[1] "Original DataTable" 
    col1 col2 col3 
1:    1    a   10 
2:    2    b    6 
3:    3    c    5 
4:    4    d    9 
5:    5    e    7 
6:    1    a    8       
col1        col2        col3    
"integer" "character"    "factor"  
[1] "Modified DataTable" 
   col1 col2 col3 
1:    1   NA   10 
2:    2   NA    6 
3:    3   NA    5 
4:    4   NA    9 
5:    5   NA    7 
6:    1   NA    8 
col1      col2      col3  
"integer" "numeric"  "factor" 

The lapply() method in R language is used to apply a user-defined function over all the components of the supplied data frame or data table object. It is mostly used for nested lists.

Syntax: 

lapply( obj , FUN)

Parameter:

obj : An object to apply conversion onto

FUN: Function applied to each element of the supplied object

The following syntax can be used to the similar conversion of the specified columns into the factor type format. This implementation is used to update the columns by reference using`:=`, e.g., DT[ , names(DT) := lapply(.SD, as.factor)], that is, it doesn’t create any copies of your data. Since, factors are categorical variables which can be used to store both integers and characters, there is no loss or ambiguity in data retrieval. 

Example:

R




library(data.table)
  
# creating a data frame
data_table <- data.table(col1 = c(1:5),
                         col2 = letters[1:5],
                         col3 = factor(sample(5:10)))
  
print ("Original DataTable")
print (data_table)
  
# getting class of columns
sapply(data_table , class)
  
# convert column into factor type
cols <- c("col1","col2")
  
# Change class of certain columns
data_table_mod <- data_table[ ,                        
                             (cols) := lapply(.SD, as.factor),
                             .SDcols = cols]
  
print ("Modified DataTable")
print (data_table_mod)
sapply(data_table_mod , class)


Output

[1] "Original DataTable"
  col1 col2 col3
1:    1    a    5
2:    2    b    8
3:    3    c    7
4:    4    d    6
5:    5    e    9
6:    1    a   10
col1        col2        col3    
"integer" "character"    "factor"  
[1] "Modified DataTable"
  col1 col2 col3
1:    1    a    5
2:    2    b    8
3:    3    c    7
4:    4    d    6
5:    5    e    9
6:    1    a   10
col1        col2        col3    
"factor" "factor" "factor"


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads