Regularization is a form of regression technique that shrinks or regularizes or constraints the coefficient estimates towards 0 (or zero). In this technique, a penalty is added to the various parameters of the model in order to reduce the freedom of the given model. The concept of Regularization can be broadly classified into:

### Implementation in R

In the R language, to perform Regularization we need a handful of packages to be installed before we start working on them. The required packages are

**glmnet**package for ridge regression and lasso regression**dplyr**package for data cleaning**psych**package in order to perform or compute the trace function of a matrix**caret**package

To install these packages we have to use the **install.packages()** in the R Console. After installing the packages successfully, we include these packages in our R Script using the **library()** command. To implement the Regularization regression technique we need to follow either of the three types of regularization techniques.

#### Ridge Regression

The **Ridge Regression** is a modified version of linear regression and is also known as **L2 Regularization**. Unlike linear regression, the loss function is modified in order to minimize the model’s complexity and this is done by adding some penalty parameter which is equivalent to the square of the value or magnitude of the coefficient. Basically, to implement Ridge Regression in R we are going to use the “**glmnet**” package. The **cv.glmnet()** function will be used to determine the ridge regression.

**Example:**

In this example, we will implement the ridge regression technique on the **mtcars** dataset for a better illustration. Our task is to predict the miles per gallon on the basis of other characteristics of the cars. We are going to use the **set.seed()** function to set seed for reproducibility. We are going to set the value of lambda in three ways:

- by performing 10 fold cross-validation
- based on the information derived
- optimal lambda based on both the criteria

## R

`# Regularization ` `# Ridge Regression in R ` `# Load libraries, get data & set ` `# seed for reproducibility ` `set.seed` `(123) ` `library` `(glmnet) ` `library` `(dplyr) ` `library` `(psych) ` ` ` `data` `(` `"mtcars"` `) ` `# Center y, X will be standardized ` `# in the modelling function ` `y <- mtcars %>% ` `select` `(mpg) %>% ` ` ` `scale` `(center = ` `TRUE` `, scale = ` `FALSE` `) %>% ` ` ` `as.matrix` `() ` `X <- mtcars %>% ` `select` `(-mpg) %>% ` `as.matrix` `() ` ` ` `# Perform 10-fold cross-validation to select lambda ` `lambdas_to_try <- 10^` `seq` `(-3, 5, length.out = 100) ` ` ` `# Setting alpha = 0 implements ridge regression ` `ridge_cv <- ` `cv.glmnet` `(X, y, alpha = 0, ` ` ` `lambda = lambdas_to_try, ` ` ` `standardize = ` `TRUE` `, nfolds = 10) ` ` ` `# Plot cross-validation results ` `plot` `(ridge_cv) ` ` ` `# Best cross-validated lambda ` `lambda_cv <- ridge_cv$lambda.min ` ` ` `# Fit final model, get its sum of squared ` `# residuals and multiple R-squared ` `model_cv <- ` `glmnet` `(X, y, alpha = 0, lambda = lambda_cv, ` ` ` `standardize = ` `TRUE` `) ` `y_hat_cv <- ` `predict` `(model_cv, X) ` `ssr_cv <- ` `t` `(y - y_hat_cv) %*% (y - y_hat_cv) ` `rsq_ridge_cv <- ` `cor` `(y, y_hat_cv)^2 ` ` ` `# selecting lamba based on the information ` `X_scaled <- ` `scale` `(X) ` `aic <- ` `c` `() ` `bic <- ` `c` `() ` `for ` `(lambda ` `in` `seq` `(lambdas_to_try)) ` `{ ` ` ` `# Run model ` ` ` `model <- ` `glmnet` `(X, y, alpha = 0, ` ` ` `lambda = lambdas_to_try[lambda], ` ` ` `standardize = ` `TRUE` `) ` ` ` ` ` `# Extract coefficients and residuals (remove first ` ` ` `# row for the intercept) ` ` ` `betas <- ` `as.vector` `((` `as.matrix` `(` `coef` `(model))[-1, ])) ` ` ` `resid <- y - (X_scaled %*% betas) ` ` ` ` ` `# Compute hat-matrix and degrees of freedom ` ` ` `ld <- lambdas_to_try[lambda] * ` `diag` `(` `ncol` `(X_scaled)) ` ` ` `H <- X_scaled %*% ` `solve` `(` `t` `(X_scaled) %*% X_scaled + ld) ` ` ` `%*% ` `t` `(X_scaled) ` ` ` `df <- ` `tr` `(H) ` ` ` ` ` `# Compute information criteria ` ` ` `aic[lambda] <- ` `nrow` `(X_scaled) * ` `log` `(` `t` `(resid) %*% resid) ` ` ` `+ 2 * df ` ` ` `bic[lambda] <- ` `nrow` `(X_scaled) * ` `log` `(` `t` `(resid) %*% resid) ` ` ` `+ 2 * df * ` `log` `(` `nrow` `(X_scaled)) ` `} ` ` ` `# Plot information criteria against tried values of lambdas ` `plot` `(` `log` `(lambdas_to_try), aic, col = ` `"orange"` `, type = ` `"l"` `, ` ` ` `ylim = ` `c` `(190, 260), ylab = ` `"Information Criterion"` `) ` `lines` `(` `log` `(lambdas_to_try), bic, col = ` `"skyblue3"` `) ` `legend` `(` `"bottomright"` `, lwd = 1, col = ` `c` `(` `"orange"` `, ` `"skyblue3"` `), ` ` ` `legend = ` `c` `(` `"AIC"` `, ` `"BIC"` `)) ` ` ` `# Optimal lambdas according to both criteria ` `lambda_aic <- lambdas_to_try[` `which.min` `(aic)] ` `lambda_bic <- lambdas_to_try[` `which.min` `(bic)] ` ` ` `# Fit final models, get their sum of ` `# squared residuals and multiple R-squared ` `model_aic <- ` `glmnet` `(X, y, alpha = 0, lambda = lambda_aic, ` ` ` `standardize = ` `TRUE` `) ` `y_hat_aic <- ` `predict` `(model_aic, X) ` `ssr_aic <- ` `t` `(y - y_hat_aic) %*% (y - y_hat_aic) ` `rsq_ridge_aic <- ` `cor` `(y, y_hat_aic)^2 ` ` ` `model_bic <- ` `glmnet` `(X, y, alpha = 0, lambda = lambda_bic, ` ` ` `standardize = ` `TRUE` `) ` `y_hat_bic <- ` `predict` `(model_bic, X) ` `ssr_bic <- ` `t` `(y - y_hat_bic) %*% (y - y_hat_bic) ` `rsq_ridge_bic <- ` `cor` `(y, y_hat_bic)^2 ` ` ` `# The higher the lambda, the more the ` `# coefficients are shrinked towards zero. ` `res <- ` `glmnet` `(X, y, alpha = 0, lambda = lambdas_to_try, ` ` ` `standardize = ` `FALSE` `) ` `plot` `(res, xvar = ` `"lambda"` `) ` `legend` `(` `"bottomright"` `, lwd = 1, col = 1:6, ` ` ` `legend = ` `colnames` `(X), cex = .7)` |

*chevron_right*

*filter_none*

**Output:**

#### Lasso Regression

Moving forward to **Lasso Regression**. It is also known as **L1 Regression, Selection Operator**,** and Least Absolute Shrinkage.** It is also a modified version of Linear Regression where again the loss function is modified in order to minimize the model’s complexity. This is done by limiting the summation of the absolute values of the coefficients of the model. In R, we can implement the lasso regression using the same “**glmnet**” package like ridge regression.

**Example:**

Again in this example, we are using the **mtcars** dataset. Here also we are going to set the lambda value like the previous example.

## R

`# Regularization ` `# Lasso Regression ` `# Load libraries, get data & set ` `# seed for reproducibility ` `set.seed` `(123) ` `library` `(glmnet) ` `library` `(dplyr) ` `library` `(psych) ` ` ` `data` `(` `"mtcars"` `) ` `# Center y, X will be standardized in the modelling function ` `y <- mtcars %>% ` `select` `(mpg) %>% ` `scale` `(center = ` `TRUE` `, ` ` ` `scale = ` `FALSE` `) %>% ` ` ` `as.matrix` `() ` `X <- mtcars %>% ` `select` `(-mpg) %>% ` `as.matrix` `() ` ` ` ` ` `# Perform 10-fold cross-validation to select lambda ` `lambdas_to_try <- 10^` `seq` `(-3, 5, length.out = 100) ` ` ` `# Setting alpha = 1 implements lasso regression ` `lasso_cv <- ` `cv.glmnet` `(X, y, alpha = 1, ` ` ` `lambda = lambdas_to_try, ` ` ` `standardize = ` `TRUE` `, nfolds = 10) ` ` ` `# Plot cross-validation results ` `plot` `(lasso_cv) ` ` ` `# Best cross-validated lambda ` `lambda_cv <- lasso_cv$lambda.min ` ` ` `# Fit final model, get its sum of squared ` `# residuals and multiple R-squared ` `model_cv <- ` `glmnet` `(X, y, alpha = 1, lambda = lambda_cv, ` ` ` `standardize = ` `TRUE` `) ` `y_hat_cv <- ` `predict` `(model_cv, X) ` `ssr_cv <- ` `t` `(y - y_hat_cv) %*% (y - y_hat_cv) ` `rsq_lasso_cv <- ` `cor` `(y, y_hat_cv)^2 ` ` ` `# The higher the lambda, the more the ` `# coefficients are shrinked towards zero. ` `res <- ` `glmnet` `(X, y, alpha = 1, lambda = lambdas_to_try, ` ` ` `standardize = ` `FALSE` `) ` `plot` `(res, xvar = ` `"lambda"` `) ` `legend` `(` `"bottomright"` `, lwd = 1, col = 1:6, ` ` ` `legend = ` `colnames` `(X), cex = .7)` |

*chevron_right*

*filter_none*

**Output:**

If we compare Lasso and Ridge Regression techniques we will notice that both the techniques are more or less the same. But there are few characteristics where they differ from each other.

- Unlike Ridge, Lasso can set some of its parameters to zero.
- In ridge the coefficient of the predictor that is correlated is similar. While in lasso only one of the coefficient of predictor is larger and the rest tends to zero.
- Ridge works well if there exist many huge or large parameters that are of the same value. While lasso works well if there exist only a small number of definite or significant parameters and rest tending to zero.

#### Elastic Net Regression

We shall now move on to **Elastic Net Regression**. Elastic Net Regression can be stated as the convex combination of the lasso and ridge regression. We can work with the **glmnet** package here even. But now we shall see how the package **caret** can be used to implement the Elastic Net Regression.

**Example:**

## R

`# Regularization ` `# Elastic Net Regression ` `library` `(caret) ` ` ` `# Set training control ` `train_control <- ` `trainControl` `(method = ` `"repeatedcv"` `, ` ` ` `number = 5, ` ` ` `repeats = 5, ` ` ` `search = ` `"random"` `, ` ` ` `verboseIter = ` `TRUE` `) ` ` ` `# Train the model ` `elastic_net_model <- ` `train` `(mpg ~ ., ` ` ` `data = ` `cbind` `(y, X), ` ` ` `method = ` `"glmnet"` `, ` ` ` `preProcess = ` `c` `(` `"center"` `, ` `"scale"` `), ` ` ` `tuneLength = 25, ` ` ` `trControl = train_control) ` ` ` `# Check multiple R-squared ` `y_hat_enet <- ` `predict` `(elastic_net_model, X) ` `rsq_enet <- ` `cor` `(y, y_hat_enet)^2 ` ` ` `print` `(y_hat_enet) ` `print` `(rsq_enet) ` |

*chevron_right*

*filter_none*

**Output:**

> print(y_hat_enet) Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant 2.13185747 1.76214273 6.07598463 0.50410531 -3.15668592 0.08734383 Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE -5.23690809 2.82725225 2.85570982 -0.19421572 -0.16329225 -4.37306992 Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental Chrysler Imperial Fiat 128 -3.83132657 -3.88886320 -8.00151118 -8.29125966 -8.08243188 6.98344302 Honda Civic Toyota Corolla Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 8.30013895 7.74742320 3.93737683 -3.13404917 -2.56900144 -5.17326892 Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino -4.02993835 7.36692700 5.87750517 6.69642869 -2.02711333 0.06597788 Maserati Bora Volvo 142E -5.90030273 4.83362156 > print(rsq_enet) [,1] mpg 0.8485501

## Recommended Posts:

- Getting the Modulus of the Determinant of a Matrix in R Programming - determinant() Function
- Set or View the Graphics Palette in R Programming - palette() Function
- tidyr Package in R Programming
- Get Exclusive Elements between Two Objects in R Programming - setdiff() Function
- Intersection of Two Objects in R Programming - intersect() Function
- Add Leading Zeros to the Elements of a Vector in R Programming - Using paste0() and sprintf() Function
- Clustering in R Programming
- Compute Variance and Standard Deviation of a value in R Programming - var() and sd() Function
- Compute Density of the Distribution Function in R Programming - dunif() Function
- Compute Randomly Drawn F Density in R Programming - rf() Function
- Data Handling in R Programming
- Return a Matrix with Lower Triangle as TRUE values in R Programming - lower.tri() Function
- Print the Value of an Object in R Programming - identity() Function
- Check if Two Objects are Equal in R Programming - setequal() Function
- Random Forest with Parallel Computing in R Programming
- R - Object Oriented Programming
- Check for Presence of Common Elements between Objects in R Programming - is.element() Function
- Check if Elements of a Vector are non-empty Strings in R Programming - nzchar() Function
- Finding the length of string in R programming - nchar() method
- Data Reshaping in R Programming

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.