Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App
geeksforgeeks
Browser
Continue

Related Articles

How to Normalize and Standardize Data in R?

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

In this article, we will be looking at the various techniques to scale data,  Min-Max Normalization, Z-Score Standardization, and Log Transformation in the R programming language.

Loading required packages and dataset:

Let’s install and load the required packages. And also create a dataframe as a sample dataset.

R




# load packages and data
install.packages("caret")
library(caret)
 
# creating a dataset
data = data.frame(var1=c(120, 345, 145, 122, 596, 285, 211),
                  var2=c(10, 15, 45, 22, 53, 28, 12),
                  var3=c(-34, 0.05, 0.15, 0.12, -6, 0.85, 0.11))
 
data

Output:

Summary of Data:

Let’s check out the summary of the data before scaling it. As we can see from the output, each variable/feature has a different range of values (which can be inferred from min and max values) and thus need scaling to bring the values within a fixed range.

R




# import the library
library(caret)
 
# creating the dataset
data = data.frame(var1 = c(120,345,145,122,596,285,211),
           var2 = c(10,15,45,22,53,28,12),
           var3 = c(-34,0.05,0.15,0.12,-6,0.85,0.11))
 
# summary of data
summary(data)

 
 

Output:

 

Normalization:

Method 1: Min-Max Normalization

 

This technique rescales values to be in the range between 0 and 1. Also, the data ends up with smaller standard deviations, which can suppress the effect of outliers.

 

Example: Let’s write a custom function to implement Min-Max Normalization. 

 

 Min-Max Normalization

 

This is the formula for Min-Max Normalization. Let’s use this formula and create a custom user-defined function, minMax which takes in one value at a time and computes the scaled value such that it lies between 0 and 1. Here new_max(A) is 1 and new_min(A) is 0 as we trying in scale down/up the values in the range [0,1].

 

This helps in handling the outliers well and suppresses them overall. 

 

R




# import the library
library(caret)
 
# dataset
data = data.frame(var1 = c(120,345,145,122,596,285,211),
           var2 = c(10,15,45,22,53,28,12),
           var3 = c(-34,0.05,0.15,0.12,-6,0.85,0.11))
 
# custom function to implement min max scaling
minMax <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}
 
#normalise data using custom function
normalisedMydata <- as.data.frame(lapply(data, minMax))
head(normalisedMydata)

 
 

Output:

 

 

Let’s now check if the values of the 4 columns are rescaled between 0 and 1 using a summary of the data (min and max are 0 and 1 respectively).

 

R




# checking summary after normalization
summary(normalisedMydata)

Output:

Example: Using an in-built function and caret package to perform Min-Max Normalization 

Here the method, preProcess( ) takes a tuple with value “range” to implement min-max scaling and this preprocessed data is sent to predict( ) function to get the final normalized data using the min-max scaling method.  

Syntax:

preProcess(x, method = c(“center”, “scale”), … na.remove = TRUE )

Arguments:

  • x – a matrix or data frame
  • method – a character vector specifying the type of processing
  • na.remove – true/false to specify removal of missing values

R




# import the library
library(caret)
 
# dataset
data = data.frame(var1 = c(120,345,145,122,596,285,211),
           var2 = c(10,15,45,22,53,28,12),
           var3 = c(-34,0.05,0.15,0.12,-6,0.85,0.11))
 
# preprocess the data
preproc <- preProcess(mydata, method=c("range"))
 
# perform normalization
norm <- predict(preproc, mydata)
head(norm)

 
 

Output:

 

 

This technique tends to center the rescaled data around the mean, but it doesn’t handle outliers very well. So to tackle this we go for standardization.

 

Method 2: Log Transformation

 

Not all real-life data would follow a gaussian distribution nor would be less skewed. So to tackle this Log Transformation technique can be used.

 

Example: Using log( ) function

 

Let’s log transform a particular column var2 in data and view it’s summary.

 

Syntax:

 

log(x, base = exp(1))

 

Arguments:

 

  • x – a numeric or complex vector
  • base – a positive or complex number

 

Log( ) function takes in numeric vector or complex vector of the data and performs log transformation.

 

R




# import the library
library(caret)
 
# dataset
data = data.frame(var1 = c(120,345,145,122,596,285,211),
           var2 = c(10,15,45,22,53,28,12),
           var3 = c(-34,0.05,0.15,0.12,-6,0.85,0.11))
 
# log transform on var2 column of data
logTransformed = log(mydata$var2)
logTransformed

Output:

Log Transformation

Standardization:

Standardization is a technique in which all the features have a mean around zero and have roughly unit variance (mean = 0 and standard deviation = 1). And also makes sure that outliers get weighted more than other values.

Example : Using Standard scale( ) function

Function:

scale(x, center = TRUE, scale = TRUE)

Arguments:

  • x – a numeric matrix(like object)
  • center – either a logical value or numeric-alike vector of length equal to the number of columns of x
  • scale – either a logical value or a numeric-alike vector of length equal to the number of columns of x

scale( ) function (a part of caret package in R) takes in a matrix or dataframe object and scales the data points such that the mean and standard deviation is 0 and 1 respectively.

R




# import the library
library(caret)
 
# dataset
data = data.frame(var1 = c(120,345,145,122,596,285,211),
           var2 = c(10,15,45,22,53,28,12),
           var3 = c(-34,0.05,0.15,0.12,-6,0.85,0.11))
 
# standardize the data using scale() function
standardizedData <- as.data.frame(scale(data))
head(standardizedData)

 
 

Output:

 

 

Example: Using an in-built function in the caret library to preprocess and then standardize the data.

 

Here the method, preProcess( ) will take a tuple with values “center” and “scale” to implement standardization. This preprocessed data is sent to predict( ) to standardize the data such that the mean is 0 and the standard deviation is 1.

 

R




# import the library
library(caret)
 
# dataset
data = data.frame(var1 = c(120,345,145,122,596,285,211),
           var2 = c(10,15,45,22,53,28,12),
           var3 = c(-34,0.05,0.15,0.12,-6,0.85,0.11))
 
# using caret lib to preprocess data
preproc1 <- preProcess(data, method=c("center", "scale"))
 
# standardize the preprocessed data
norm1 <- predict(preproc1,data)
head(norm1)

 
 

Output:

 

 


My Personal Notes arrow_drop_up
Last Updated : 14 Jan, 2022
Like Article
Save Article
Similar Reads
Related Tutorials