Open In App

Convert Multiple Columns to Numeric Using dplyr

In data analysis with R Programming Language, it's common to encounter datasets where certain columns must be converted to numeric type for further study or modeling. In this article, we'll explore how to efficiently convert multiple columns to numeric using the dplyr package in R.

Identifying Columns for Conversion

To convert columns to numeric, it is important to first identify which columns require conversion. This can be achieved by examining the structure of the data frame using the str() function or the glimpse() function from the dplyr package. These functions assist in identifying non-numeric columns.

The 'str()' function provides a compact display of the internal structure of an R object. It's beneficial for identifying the data types of each column in a data frame. For example:

# Inspect data frame structure

str(data)

Another option is to use the 'glimpse()' function from the dplyr package. It provides a concise summary of a data frame, showing the data type and the first few values of each column. For example:

# Inspect data frame structure with glimpse

glimpse(data)

By using these functions, we can quickly identify columns that are currently stored as non-numeric data types, such as characters or factors, and determine which ones need to be converted to numeric for further analysis or modeling.

# Sample data
data <- data.frame(
  ID = 1:5,
  col1 = c("10", "20", "30", "40", "50"),
  col2 = c("15.5", "25.5", "35.5", "45.5", "55.5"),
  col3 = c("NA", "60", "70", "80", "90")
)

# Inspect data frame structure
str(data)

Output:

'data.frame':    5 obs. of  4 variables:
$ ID : int 1 2 3 4 5
$ col1: chr "10" "20" "30" "40" ...
$ col2: chr "15.5" "25.5" "35.5" "45.5" ...
$ col3: chr "NA" "60" "70" "80" ...

Converting Data Types

In R, data types play a crucial role in data analysis and manipulation. Often, datasets contain columns with different data types, such as character, factor, or integer. To perform various analytical tasks effectively, it's essential to ensure that data types are appropriate for the intended operations. One common task is converting columns to numeric type, especially when dealing with numerical data.

# Convert a single column to numeric

data$numeric_column <- as.numeric(data$numeric_column)

By using these functions, we can quickly identify columns that are currently stored as non-numeric data types, such as characters or factors, and determine which ones need to be converted to numeric for further analysis or modeling.

Using mutate_at() for Multiple Columns

The 'dplyr' package offers a concise and efficient way to manipulate data frames in R. It provides functions like 'mutate()', 'select()', 'filter()', and 'summarise()' for various data manipulation tasks. When it comes to converting multiple columns to numeric type, the 'mutate_at()' function is particularly useful.

# Sample data
data <- data.frame(
  ID = 1:5,
  col1 = c("10", "20", "30", "40", "50"),
  col2 = c("15.5", "25.5", "35.5", "45.5", "55.5"),
  col3 = c("50", "60", "70", "80", "90")
)

# Display original data
print("Original Data:")
str(data)

# Load required library
library(dplyr)

# Convert multiple columns to numeric
data <- data %>%
  mutate_at(vars(col1, col2, col3), as.numeric)

# Display converted data
print("Converted Data:")
str(data)

Output:

[1] "Original Data:"
'data.frame': 5 obs. of 4 variables:
$ ID : int 1 2 3 4 5
$ col1: chr "10" "20" "30" "40" ...
$ col2: chr "15.5" "25.5" "35.5" "45.5" ...
$ col3: chr "50" "60" "70" "80" ...

[1] "Converted Data:"
'data.frame': 5 obs. of 4 variables:
$ ID : int 1 2 3 4 5
$ col1: num 10 20 30 40 50
$ col2: num 15.5 25.5 35.5 45.5 55.5
$ col3: num 50 60 70 80 90

The mutate_at() function from the dplyr package allows us to apply transformations to multiple columns at once.

Dealing with NAs

After conversion, it's essential to check for missing values (NA) in the newly converted columns. Depending on the analysis, we may choose to impute missing values or exclude them from further calculations.

# Sample data
data <- data.frame(
  ID = 1:5,
  col1 = c("10", "20", "30", "40", "50"),
  col2 = c("15.5", "25.5", "35.5", "45.5", "55.5"),
  col3 = c("NA", "60", "70", "80", "90")
)

# Display original data
print("Original Data:")
str(data)

# Load required library
library(dplyr)

# Convert multiple columns to numeric
data <- data %>%
  mutate_at(vars(col1, col2, col3), as.numeric)

# Display converted data
print("Converted Data:")
str(data)

Output:

[1] "Original Data:"
'data.frame': 5 obs. of 4 variables:
$ ID : int 1 2 3 4 5
$ col1: chr "10" "20" "30" "40" ...
$ col2: chr "15.5" "25.5" "35.5" "45.5" ...
$ col3: chr "NA" "60" "70" "80" ...

Warning message:
There was 1 warning in `mutate()`.
ℹ In argument: `col3 = .Primitive("as.double")(col3)`.
Caused by warning:
! NAs introduced by coercion

[1] "Converted Data:"
'data.frame': 5 obs. of 4 variables:
$ ID : int 1 2 3 4 5
$ col1: num 10 20 30 40 50
$ col2: num 15.5 25.5 35.5 45.5 55.5
$ col3: num NA 60 70 80 90

The warning message you received indicates that there were NA values in one of the columns (col3) during the conversion process, which were replaced with NA after coercion. To rectify this warning and handle NA values more gracefully, you can modify the code to explicitly handle the conversion and replace any non-numeric values with NA using the coalesce() function.

parse_number() function from the readr package is used to convert the values to numeric. It extracts the numeric values from the character strings while ignoring non-numeric characters.

To load 'the readr' package use:

library(reader)

Handle missing values or Non-Convertible values

To handle missing values or non-convertible values when converting columns to numeric, we can use the mutate() function along with the across() function from the dplyr package, combined with error handling mechanisms such as tryCatch() or coalesce().

Using tryCatch() for Error Handling

In this example, we'll use the tryCatch() function to attempt the conversion of columns to numeric. If an error occurs during the conversion (e.g., due to non-convertible values), we'll replace the problematic values with NA.

# Sample data
data <- data.frame(
  col1 = c("10", "20", "30", "a"),
  col2 = c("15.5", "25.5", "35.5", "b")
)

# Convert multiple columns to numeric with error handling
data <- data %>%
  mutate(across(everything(), ~ {
    as.numeric(tryCatch(., error = function(e) NA))
  }))

# Print converted data
print("Converted Data:")
str(data)

Output:

[1] "Converted Data:"
'data.frame': 4 obs. of 2 variables:
$ col1: num 10 20 30 NA
$ col2: num 15.5 25.5 35.5 NA

Using coalesce() to Replace Missing Values

In this example, we'll use the coalesce() function to replace missing values with NA before converting columns to numeric. This ensures that missing values are properly handled during the conversion process.

# Sample data
data <- data.frame(
  col1 = c("10", "20", "30", NA),
  col2 = c("15.5", "25.5", "35.5", NA)
)
str(data)
# Convert multiple columns to numeric, handling missing values
data <- data %>%
  mutate(across(everything(), ~ {
    as.numeric(coalesce(., NA))
  }))

# Print converted data
print("Converted Data:")
str(data)

Output:

'data.frame':    4 obs. of  2 variables:
$ col1: chr "10" "20" "30" NA
$ col2: chr "15.5" "25.5" "35.5" NA

[1] "Converted Data:"
'data.frame': 4 obs. of 2 variables:
$ col1: num 10 20 30 NA
$ col2: num 15.5 25.5 35.5 NA

Converting Columns Based on Specific Criteria

Suppose we have a dataset where we want to convert only numeric columns to numeric type, leaving non-numeric columns unchanged. We can achieve this by using a condition inside the across() function.

# Load necessary libraries
library(dplyr)
library(readr)  # For parse_number()

# Sample data
data <- data.frame(
  col1 = c("10", "20", "30", "a"),
  col2 = c("15.5", "25.5", "35.5", "b"),
  col3 = c(1, 2, 3, 4),
  col4 = c("text1", "text2", "text3", "text4"),
  stringsAsFactors = FALSE
)
str(data)
# Convert character columns to numeric using parse_number()
data <- data %>%
  mutate(across(where(is.character), ~parse_number(.)))

# Print the converted data
print("Converted Data:")
str(data)

Output:

'data.frame':    4 obs. of  4 variables:
$ col1: chr "10" "20" "30" "a"
$ col2: chr "15.5" "25.5" "35.5" "b"
$ col3: num 1 2 3 4
$ col4: chr "text1" "text2" "text3" "text4"

[1] "Converted Data:"
'data.frame': 4 obs. of 4 variables:
$ col1: num 10 20 30 NA
..- attr(*, "problems")= tibble [1 × 4] (S3: tbl_df/tbl/data.frame)
.. ..$ row : int 4
.. ..$ col : int NA
.. ..$ expected: chr "a number"
.. ..$ actual : chr "a"
$ col2: num 15.5 25.5 35.5 NA
..- attr(*, "problems")= tibble [1 × 4] (S3: tbl_df/tbl/data.frame)
.. ..$ row : int 4
.. ..$ col : int NA
.. ..$ expected: chr "a number"
.. ..$ actual : chr "b"
$ col3: num 1 2 3 4
$ col4: num 1 2 3 4

Conclusion

In this article, we've demonstrated how to convert multiple columns to numeric type using the dplyr package in R. By leveraging functions like mutate_at() and handling conversion errors and missing values effectively, we can streamline our data preprocessing workflow and prepare our data for analysis or modeling tasks.

Article Tags :