Convert DataFrame Column to Numeric in R

Last Updated : 16 May, 2021

In this article, we are going to see how to convert DataFrame Column to Numeric in R Programming Language.

All dataframe column is associated with a class which is an indicator of the data type to which the elements of that column belong to. Therefore, in order to simulate the data type conversion, the data elements have to be converted to the desired data type in this case, that is all the elements of that column should be eligible to become numerical values.

sapply() method can be used to retrieve the data type of the column variables in the form of a vector. The dataframe that is used for the operations below is as follows :

R

# declare a dataframe 
# different data type have been  
# indicated for different cols 
data_frame <- data.frame( 
                col1 = as.character(1:4),  
                col2 = factor(4:7),  
                col3 = letters[2:5],  
                col4 = 97:100, stringsAsFactors = FALSE) 
  
print("Original DataFrame") 
print (data_frame) 
  
# indicating the data type of  
# each variable  
sapply(data_frame, class) 

Output:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"

transform() method can be used to simulate modification in the data object specified in the argument list of this method. The changes have to be explicitly saved into either the same dataframe or a new one. It can be used to either add new variables to the data or modify the existing ones.

Syntax: transform(data, value)

Arguments :

data – The data object to be modified

value – The value to be added

Example 1: Converting factor type columns to numeric

The data may not be preserved while making these conversions. There may be loss or tampering of the data. The result of the transform operation has to be saved in some variable in order to work further with it. The following code snippet illustrates this :

R

# declare a dataframe 
# different data type have been 
# indicated for different cols 
data_frame <- data.frame( 
                col1 = as.character(1:4),  
                col2 = factor(4:7),  
                col3 = letters[2:5],  
                col4 = 97:100, stringsAsFactors = FALSE) 
  
print("Original DataFrame") 
print (data_frame) 
  
# indicating the data type of each  
# variable  
sapply(data_frame, class) 
  
# converting factor type column to  
# numeric  
data_frame_mod <- transform( 
  data_frame,col2 = as.numeric(col2)) 
  
print("Modified DataFrame") 
print (data_frame_mod) 
  
# indicating the data type of each variable  
sapply(data_frame_mod, class) 

Output:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified DataFrame"
 col1 col2 col3 col4
1    1    1    b   97
2    2    2    c   98
3    3    3    d   99
4    4    4    e  100
      col1        col2        col3        col4
"character"   "numeric" "character"   "integer"

Explanation: The original dataframe values in col2 range from 4 to 7, while in modified they are integers beginning with 1. This means during direct conversion of factor to numeric, the data may not be preserved.

In order to preserve the data, the type of the columns needs to be explicitly first cast to as.character(col-name).

R

# declare a dataframe 
# different data type have been  
# indicated for different cols 
data_frame <- data.frame( 
                col1 = as.character(1:4),  
                col2 = factor(4:7),  
                col3 = letters[2:5],  
                col4 = 97:100, stringsAsFactors = FALSE) 
  
print("Original DataFrame") 
print (data_frame) 
  
# indicating the data type of each 
# variable  
sapply(data_frame, class) 
  
# converting factor type column to  
# numeric  
data_frame_mod <- transform( 
  data_frame, col2 = as.numeric(as.character(col2))) 
  
print("Modified DataFrame") 
print (data_frame_mod) 
  
# indicating the data type of each 
# variable  
sapply(data_frame_mod, class) 

Output:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"   "numeric" "character"   "integer"

Explanation: In order to maintain uniformity of data, the data type of col2 is first changed to as.character and then to numerical values, which displays the data as it is.

Example 2: Converting character type columns to numeric

The character type columns, be single characters or strings can be converted into numeric values only if these conversions are possible. Otherwise, the data is lost and coerced into missing or NA values by the compiler upon execution.

This approach depicts the data loss due to the insertion of missing or NA values in place of characters. These NA values are introduced since interconversion is not directly possible.

R

# declare a dataframe 
# different data type have been  
# indicated for different cols 
data_frame <- data.frame( 
                col1 = as.character(6:9),  
                col2 = factor(4:7),  
                col3 = letters[2:5],  
                col4 = 97:100, stringsAsFactors = FALSE) 
  
print("Original DataFrame") 
print (data_frame) 
  
# indicating the data type of each  
# variable  
sapply(data_frame, class) 
  
# converting character type column 
# to numeric  
data_frame_col1 <- transform( 
  data_frame,col1 = as.numeric(col1)) 
  
print("Modified col1 DataFrame") 
print (data_frame_col1) 
  
# indicating the data type of each  
# variable  
sapply(data_frame_col1, class) 
  
  
# converting character type column  
# to numeric  
data_frame_col3 <- transform( 
  data_frame,col3 = as.numeric(col3)) 
  
print("Modified col3 DataFrame") 
print (data_frame_col3) 
  
# indicating the data type of each 
# variable  
sapply(data_frame_col3, class) 

Output:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    6    4    b   97
2    7    5    c   98
3    8    6    d   99
4    9    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified col1 DataFrame"
 col1 col2 col3 col4
1    6    4    b   97
2    7    5    c   98
3    8    6    d   99
4    9    7    e  100
      col1        col2        col3        col4
 "numeric"    "factor" "character"   "integer"
[1] "Modified col3 DataFrame"
 col1 col2 col3 col4
1    6    4   NA   97
2    7    5   NA   98
3    8    6   NA   99
4    9    7   NA  100
      col1        col2        col3        col4
"character"    "factor"   "numeric"   "integer"
Warning message:
In eval(substitute(list(...)), `_data`, parent.frame()) :
 NAs introduced by coercion

Explanation: Using the sapply() method, the class of the col3 of the dataframe is character, that is it consists of single byte character values, but on application of transform() method, these character values are converted to missing or NA values, because the character is not directly convertible to numeric data. So, this leads to data loss.

The conversion can be made by not using stringAsFactors=FALSE and then first implicitly converting the character to factor using as.factor() and then to numeric data type using as.numeric(). The information about the actual strings is completely lost even in this case. However, the data becomes ambiguous and may lead to actual data loss. The data is simply assigned numeric values based on the lexicographic sorting result of the column values.

R

# declare a dataframe 
# different data type have been  
# indicated for different cols 
data_frame <- data.frame( 
                col1 = as.character(6:9),  
                col2 = factor(4:7),  
                col3 = c("Geeks","For","Geeks","Gooks"),  
                col4 = 97:100) 
  
print("Original DataFrame") 
print (data_frame) 
  
# indicating the data type of each 
# variable  
sapply(data_frame, class) 
  
# converting character type column  
# to numeric  
data_frame_col3 <- transform( 
  data_frame,col3 = as.numeric(as.factor(col3))) 
  
print("Modified col3 DataFrame") 
print (data_frame_col3) 
  
# indicating the data type of each 
# variable  
sapply(data_frame_col3, class) 

Output:

[1] "Original DataFrame"
 col1 col2  col3 col4
1    6    4 Geeks   97
2    7    5   For   98
3    8    6 Geeks   99
4    9    7 Gooks  100
    col1      col2      col3      col4
"factor"  "factor"  "factor" "integer"
[1] "Modified col3 DataFrame"
 col1 col2 col3 col4
1    6    4    2   97
2    7    5    1   98
3    8    6    2   99
4    9    7    3  100
    col1      col2      col3      col4
"factor"  "factor" "numeric" "integer"

Explanation : The first and third string in col3 are the same, therefore, assigned the same numeric value. And in total, the values are sorted in ascending order and then assigned corresponding integer values. “For” is the smallest string appearing in lexicographic order, therefore, assigned numeric value of 1, then “Geeks”, both instances of which are mapped to 2 and “Gooks” is assigned a numeric value of 3. Thus, the col3 type changes to numeric.

Example 3: Converting logical type columns to numeric

The truth boolean value is assigned a numerical value equivalent to 2 and false is assigned a numeric value of 1. The conversion can be easily carried out while maintaining data uniformity.

In order, to preserve the data, the column consisting these logical values is first transformed to factor type values using as.factor and then these values are assigned a numerical value using as.numeric(), which simply assigns integer identifiers to these two values.

R

# declare a dataframe 
# different data type have been 
# indicated for different cols 
data_frame <- data.frame( 
                col1 = as.character(6:9),  
                col2 = factor(4:7),  
                col3 = c("Geeks","For","Geeks","Gooks"),  
                col4 = 97:100, 
                col5 = c(TRUE,FALSE,TRUE,FALSE)) 
  
print("Original DataFrame") 
print (data_frame) 
  
# indicating the data type of each  
# variable  
sapply(data_frame, class) 
  
# converting character type column  
# to numeric  
data_frame_col5 <- transform( 
  data_frame,col5 = as.numeric(as.factor(col5))) 
print("Modified col5 DataFrame") 
print (data_frame_col5) 
  
# indicating the data type of each  
# variable  
sapply(data_frame_col5, class) 

Output:

[1] "Original DataFrame"
 col1 col2  col3 col4  col5
1    6    4 Geeks   97  TRUE
2    7    5   For   98 FALSE
3    8    6 Geeks   99  TRUE
4    9    7 Gooks  100 FALSE
    col1      col2      col3      col4      col5
"factor"  "factor"  "factor" "integer" "logical"
[1] "Modified col5 DataFrame"
 col1 col2  col3 col4 col5
1    6    4 Geeks   97    2
2    7    5   For   98    1
3    8    6 Geeks   99    2
4    9    7 Gooks  100    1
    col1      col2      col3      col4      col5
"factor"  "factor"  "factor" "integer" "numeric"

Explanation : Using the sapply() method, the class of the col5 of the dataframe is logical, that is it consists of TRUE and FALSE boolean values, but on the application of transform() method, these logical values are mapped to integers, and the class of col5 is converted to numeric.

Suggest improvement

Get Standard Deviation of a Column in R dataframe

How to prevent scientific notation in R?

Share your thoughts in the comments

Convert DataFrame Column to Numeric in R

R

R

R

R

R

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?