**Prerequsites: **Gradient Descent

**Overfitting** is a phenomenon that occurs when a Machine Learning model is constraint to training set and not able to perform well on unseen data.

Regularisation is a technique used to reduce the errors by fitting the function appropriately on the given training set and avoid overfitting.

The commonly used regularisation techniques are :

- L1 regularisation
- L2 regularisation
- Dropout regularisation

This article focus on L1 and L2 regularisation.

A regression model which uses **L1 Regularisation **technique is called **LASSO(Least Absolute Shrinkage and Selection Operator)** regression.

A regression model that uses **L2 regularisation** technique is called **Ridge regression**.**Lasso Regression** adds *“absolute value of magnitude”* of coefficient as penalty term to the loss function(L).

**Ridge regression** adds “*squared magnitude*” of coefficient as penalty term to the loss function(L).

**NOTE** that during Regularisation the output function(y_hat) does not change. The change is only in the loss function.

The output function:

The loss function before regularisation:

The loss function after regularisation:

We define Loss function in Logistic Regression as :

L(y_hat,y) = y log y_hat + (1 - y)log(1 - y_hat)

**Loss function with no regularisation :**

L = y log (wx + b) + (1 - y)log(1 - (wx + b))

Lets say the data overfits the above function.

**Loss function with L1 regularisation :**

L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||_{1}

**Loss function with L2 regularisation :**

L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||^{2}_{2}

**lambda** is a Hyperparameter Known as regularisation constant and it is greater than zero.

lambda > 0