Open In App

Convert DataFrame Column to Numeric in R

Last Updated : 16 May, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to see how to convert DataFrame Column to Numeric in R Programming Language. 

All dataframe column is associated with a class which is an indicator of the data type to which the elements of that column belong to. Therefore, in order to simulate the data type conversion, the data elements have to be converted to the desired data type in this case, that is all the elements of that column should be eligible to become numerical values. 

sapply() method can be used to retrieve the data type of the column variables in the form of a vector. The dataframe that is used for the operations below is as follows : 

R




# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(1:4), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of 
# each variable 
sapply(data_frame, class)


Output:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"

transform() method can be used to simulate modification in the data object specified in the argument list of this method. The changes have to be explicitly saved into either the same dataframe or a new one. It can be used to either add new variables to the data or modify the existing ones. 

Syntax: transform(data, value)

Arguments : 

  • data – The data object to be modified
  • value – The value to be added

Example 1: Converting factor type columns to numeric 

The data may not be preserved while making these conversions. There may be loss or tampering of the data. The result of the transform operation has to be saved in some variable in order to work further with it. The following code snippet illustrates this : 

R




# declare a dataframe
# different data type have been
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(1:4), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting factor type column to 
# numeric 
data_frame_mod <- transform(
  data_frame,col2 = as.numeric(col2))
  
print("Modified DataFrame")
print (data_frame_mod)
  
# indicating the data type of each variable 
sapply(data_frame_mod, class)


Output:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified DataFrame"
 col1 col2 col3 col4
1    1    1    b   97
2    2    2    c   98
3    3    3    d   99
4    4    4    e  100
      col1        col2        col3        col4
"character"   "numeric" "character"   "integer" 

Explanation: The original dataframe values in col2 range from 4 to 7, while in modified they are integers beginning with 1. This means during direct conversion of factor to numeric, the data may not be preserved. 

In order to preserve the data, the type of the columns needs to be explicitly first cast to as.character(col-name). 

R




# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(1:4), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable 
sapply(data_frame, class)
  
# converting factor type column to 
# numeric 
data_frame_mod <- transform(
  data_frame, col2 = as.numeric(as.character(col2)))
  
print("Modified DataFrame")
print (data_frame_mod)
  
# indicating the data type of each
# variable 
sapply(data_frame_mod, class)


Output:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"   "numeric" "character"   "integer" 

Explanation: In order to maintain uniformity of data, the data type of col2 is first changed to as.character and then to numerical values, which displays the data as it is.

Example 2: Converting character type columns to numeric 

The character type columns, be single characters or strings can be converted into numeric values only if these conversions are possible. Otherwise, the data is lost and coerced into missing or NA values by the compiler upon execution. 

This approach depicts the data loss due to the insertion of missing or NA values in place of characters. These NA values are introduced since interconversion is not directly possible. 

R




# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting character type column
# to numeric 
data_frame_col1 <- transform(
  data_frame,col1 = as.numeric(col1))
  
print("Modified col1 DataFrame")
print (data_frame_col1)
  
# indicating the data type of each 
# variable 
sapply(data_frame_col1, class)
  
  
# converting character type column 
# to numeric 
data_frame_col3 <- transform(
  data_frame,col3 = as.numeric(col3))
  
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable 
sapply(data_frame_col3, class)


Output:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    6    4    b   97
2    7    5    c   98
3    8    6    d   99
4    9    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified col1 DataFrame"
 col1 col2 col3 col4
1    6    4    b   97
2    7    5    c   98
3    8    6    d   99
4    9    7    e  100
      col1        col2        col3        col4
 "numeric"    "factor" "character"   "integer"
[1] "Modified col3 DataFrame"
 col1 col2 col3 col4
1    6    4   NA   97
2    7    5   NA   98
3    8    6   NA   99
4    9    7   NA  100
      col1        col2        col3        col4
"character"    "factor"   "numeric"   "integer"
Warning message:
In eval(substitute(list(...)), `_data`, parent.frame()) :
 NAs introduced by coercion

Explanation: Using the sapply() method, the class of the col3 of the dataframe is character, that is it consists of single byte character values, but on application of transform() method, these character values are converted to missing or NA values, because the character is not directly convertible to numeric data. So, this leads to data loss. 

The conversion can be made by not using stringAsFactors=FALSE and then first implicitly converting the character to factor using as.factor() and then to numeric data type using as.numeric(). The information about the actual strings is completely lost even in this case. However, the data becomes ambiguous and may lead to actual data loss. The data is simply assigned numeric values based on the lexicographic sorting result of the column values. 

R




# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = c("Geeks","For","Geeks","Gooks"), 
                col4 = 97:100)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable 
sapply(data_frame, class)
  
# converting character type column 
# to numeric 
data_frame_col3 <- transform(
  data_frame,col3 = as.numeric(as.factor(col3)))
  
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable 
sapply(data_frame_col3, class)


Output:

[1] "Original DataFrame"
 col1 col2  col3 col4
1    6    4 Geeks   97
2    7    5   For   98
3    8    6 Geeks   99
4    9    7 Gooks  100
    col1      col2      col3      col4
"factor"  "factor"  "factor" "integer"
[1] "Modified col3 DataFrame"
 col1 col2 col3 col4
1    6    4    2   97
2    7    5    1   98
3    8    6    2   99
4    9    7    3  100
    col1      col2      col3      col4
"factor"  "factor" "numeric" "integer" 

Explanation : The first and third string in col3 are the same, therefore, assigned the same numeric value. And in total, the values are sorted in ascending order and then assigned corresponding integer values. “For” is the smallest string appearing in lexicographic order, therefore, assigned numeric value of 1, then “Geeks”, both instances of which are mapped to 2 and “Gooks” is assigned a numeric value of 3. Thus, the col3 type changes to numeric.

Example 3: Converting logical type columns to numeric 

The truth boolean value is assigned a numerical value equivalent to 2 and false is assigned a numeric value of 1. The conversion can be easily carried out while maintaining data uniformity.

In order, to preserve the data, the column consisting these logical values is first transformed to factor type values using as.factor and then these values are assigned a numerical value using as.numeric(), which simply assigns integer identifiers to these two values. 

R




# declare a dataframe
# different data type have been
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = c("Geeks","For","Geeks","Gooks"), 
                col4 = 97:100,
                col5 = c(TRUE,FALSE,TRUE,FALSE))
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting character type column 
# to numeric 
data_frame_col5 <- transform(
  data_frame,col5 = as.numeric(as.factor(col5)))
print("Modified col5 DataFrame")
print (data_frame_col5)
  
# indicating the data type of each 
# variable 
sapply(data_frame_col5, class)


Output:

[1] "Original DataFrame"
 col1 col2  col3 col4  col5
1    6    4 Geeks   97  TRUE
2    7    5   For   98 FALSE
3    8    6 Geeks   99  TRUE
4    9    7 Gooks  100 FALSE
    col1      col2      col3      col4      col5
"factor"  "factor"  "factor" "integer" "logical"
[1] "Modified col5 DataFrame"
 col1 col2  col3 col4 col5
1    6    4 Geeks   97    2
2    7    5   For   98    1
3    8    6 Geeks   99    2
4    9    7 Gooks  100    1
    col1      col2      col3      col4      col5
"factor"  "factor"  "factor" "integer" "numeric" 

Explanation : Using the sapply() method, the class of the col5 of the dataframe is logical, that is it consists of TRUE and FALSE boolean values, but on the application of transform() method, these logical values are mapped to integers, and the class of col5 is converted to numeric. 



Similar Reads

How to convert DataFrame column from Character to Numeric in R ?
In this article, we will discuss how to convert DataFrame column from Character to Numeric in R Programming Language. All dataframe column is associated with a class which is an indicator of the data type to which the elements of that column belong to. Therefore, in order to simulate the data type conversion, the data elements have to be converted
5 min read
Convert Factor to Numeric and Numeric to Factor in R Programming
Factors are data structures that are implemented to categorize the data or represent categorical data and store it on multiple levels. They can be stored as integers with a corresponding label to every unique integer. Though factors may look similar to character vectors, they are integers, and care must be taken while using them as strings. The fac
5 min read
Convert Data Frame Column to Numeric in R
R DataFrame is made up of three principal components, the data, rows, and columns. Data frames in R are versatile data structures to matrices where each column can contain different data types. This versatility allows for complex data analysis tasks. Converting columns to numeric type is a common operation in data analysis workflows, enabling mathe
4 min read
How to Convert Numeric Dataframe to Text in xlsx File in R
In this article, we are going to convert a numeric data frame which means the data present in both rows and columns of the data frame is of numeric type to an Excel file(.xlsx). To achieve this mechanism in R Programming, we have a package called writexl which contains the write_xlsx() function which is used to convert the data frame to an Excel Fi
2 min read
Check if an Object is of Type Numeric in R Programming - is.numeric() Function
is.numeric() function in R Language is used to check if the object passed to it as argument is of numeric type. Syntax: is.numeric(x) Parameters: x: Object to be checked Example 1: # R program to check if # object is of numeric type # Calling is.numeric() function is.numeric(1) is.numeric(1.5) is.numeric(-1.5) Output: [1] TRUE [1] TRUE [1] TRUE Exa
1 min read
Extract specific column from a DataFrame using column name in R
In this article, we are going to see how to extract a specific column from a dataframe using the column name in R Programming Language. In the data.frame() we have to pass dataframe_name followed by $ symbol followed by column name. The reason to pass dataframe_name$ column name to data.frame() is, after extracting the data from column we have to s
5 min read
Convert list to dataframe with specific column names in R
A list contains different types of objects as their components. The components may belong to different data types or different dimensions. Vector can be useful components of a list and can be easily mapped as the rows or columns of a dataframe. Each column in the dataframe is referenced using a unique name, which can be either equivalent to the lis
4 min read
Convert DataFrame to Matrix with Column Names in R
Data frames and matrices are R objects, both of which allow tabular storage of the data into well organized cells. However, the data in a data frame can consist of different data types, that is the cells may contain data belonging to a combination of data types. Matrices, on the other hand, strictly allow a singular data type value to be stored acr
3 min read
Convert dataframe column to vector in R
In this article, we will discuss how to convert a DataFrame column to vector in R Programming Language. To extract a single vector from a data frame in R Programming language, as.vector() function can be used. Syntax: as.vector( data_frame$column_name ) Here, data_frame is the name of the data framecolumn_name is the column to be extracted Given be
1 min read
Convert Row Names into Column of DataFrame in R
In this article, we will discuss how to Convert Row Names into Columns of Dataframe in R Programming Language. Method 1: Using row.names() row.name() function is used to set and get the name of the DataFrame. Apply the row.name() function to the copy of the DataFrame and a name to the column which contains the name of the column with the help of th
3 min read