Open In App

How to Normalize Data in R?

Last Updated : 01 Aug, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to normalize data in the R programming language.

What is Normalization?

Normalization is a pre-processing stage of any type of problem statement. In particular, normalization takes an important role in the field of soft computing, cloud computing, etc. for the manipulation of data, scaling down, or scaling up the range of data before it becomes used for further stages. There are so many normalization techniques there, namely Min-Max normalization, Z-score normalization, and Decimal scaling normalization.

What is Data Normalization?

Data transformation operations, such as normalization and aggregation, are additional data preprocessing procedures that would contribute toward the success of the data extraction process.

Data normalization consists of remodeling numeric columns to a standard scale. Data normalization is generally considered the development of clean data.

Method 1: Normalize data with log transformation in base R

In this approach to normalize the data with its log transformation, the user needs to call the log() which is an inbuilt function, and pass the data frame as its parameter to transform the given data to its log and the resulting data will then be transformed to the scale.

log() function is used to compute logarithms, by default natural logarithms.

Syntax:

log(x)

Parameters:

  • x: a numeric or complex vector.

Example: Normalize data 

R




# Create data
gfg <- c(244, 753, 596, 645, 874, 141,
          639, 465, 999, 654)
 
# normalizing data
gfg1 <-log(gfg)
gfg1


Output:

 [1] 5.497168 6.624065 6.390241 6.469250 6.773080 4.948760 6.459904 6.142037 6.906755
[10] 6.483107

Method 2: Normalize Data with Standard Scaling in R

In this method to normalize the data, the user simply needs to call the scale() function which is an inbuilt function, and pass the data which is needed to be scaled, and further, this will be resulting in normalized data from range -1 to 1 in the R programming language.

Scale() is a generic function whose default method centers and/or scales the columns of a numeric matrix.

Syntax:

scale(x)

Parameters:

  • x: Data

Example: Normalize data

R




# Create data
gfg <- c(244,753,596,645,874,141,639,465,999,654)
 
# normalizing data
gfg <- as.data.frame(scale(gfg))
gfg


Output:

            V1
1 -1.36039519
2 0.57921588
3 -0.01905315
4 0.16766775
5 1.04030220
6 -1.75289016
7 0.14480397
8 -0.51824578
9 1.51663105
10 0.20196343

Method 3: Normalize Data using Min-Max Scaling

In this method to normalize, the user has to first install and import the caret package in the R working console, and then the user needs to call the preProcess() function with the method passed as the range as its parameters, and then the user calls the predict() function to get the final normalize data which will lead to the normalization of the given data to the scale from 0 to 1 in the R programming language.

preprocess () function is used for transformation can be estimated from the training data and applied to any data set with the same variables.

Syntax:

preProcess(x,method)

Parameters:

  • x: Data
  • method: a character vector specifying the type of processing.

Example: Normalize data

R




library(caret)
 
# Create data
gfg <- c(244,753,596,645,874,141,639,465,999,654)
 
# normalizing data
ss <- preProcess(as.data.frame(gfg), method=c("range"))
 
gfg <- predict(ss, as.data.frame(gfg))
gfg


Output:

         gfg
1 0.1200466
2 0.7132867
3 0.5303030
4 0.5874126
5 0.8543124
6 0.0000000
7 0.5804196
8 0.3776224
9 1.0000000
10 0.5979021

Method 4: Normalize Data using Z-Score Standardization

In statistics, the task is to standardize variables which is called evaluating z-scores. Comparing two standardizing variables is the function of standardizing vector. By subtracting the vector by its mean and dividing the result by the vector’s standard deviation we can standardize a vector.

R




# Input vector
gfg <- c(244, 753, 596, 645, 874, 141, 639, 465, 999, 654)
 
# Z-score standardization
gfg_standardized <- (gfg - mean(gfg)) / sd(gfg)
 
# View the standardized vector
print(gfg_standardized)


Output:

 [1] -1.36039519  0.57921588 -0.01905315  0.16766775  1.04030220 -1.75289016
[7] 0.14480397 -0.51824578 1.51663105 0.20196343


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads