# Regularization in Machine Learning

**Prerequisites: **Gradient Descent

**Overfitting** is a phenomenon that occurs when a Machine Learning model is constraint to training set and not able to perform well on unseen data.

Regularization is a technique used to reduce the errors by fitting the function appropriately on the given training set and avoid overfitting.

The commonly used regularization techniques are :

- L1 regularization
- L2 regularization
- Dropout regularization

This article focus on L1 and L2 regularization.

A regression model which uses **L1 Regularization **technique is called **LASSO(Least Absolute Shrinkage and Selection Operator)** regression.

A regression model that uses **L2 regularization** technique is called **Ridge regression**. **Lasso Regression** adds *“absolute value of magnitude”* of coefficient as penalty term to the loss function(L).

**Ridge regression** adds “*squared magnitude*” of coefficient as penalty term to the loss function(L).

**NOTE** that during Regularization the output function(y_hat) does not change. The change is only in the loss function.

The output function:

The loss function before regularization:

The loss function after regularization:

We define Loss function in Logistic Regression as :

L(y_hat,y) = y log y_hat + (1 - y)log(1 - y_hat)

**Loss function with no regularization :**

L = y log (wx + b) + (1 - y)log(1 - (wx + b))

Lets say the data overfits the above function.

**Loss function with L1 regularization :**

L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||_{1}

**Loss function with L2 regularization :**

L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||^{2}_{2}

**lambda** is a Hyperparameter Known as regularization constant and it is greater than zero.

lambda > 0