Prerequsites: Gradient Descent
Overfitting is a phenomenon that occurs when a Machine Learning model is constraint to training set and not able to perform well on unseen data.
Regularisation is a technique used to reduce the errors by fitting the function appropriately on the given training set and avoid overfitting.
The commonly used regularisation techniques are :
- L1 regularisation
- L2 regularisation
- Dropout regularisation
This article focus on L1 and L2 regularisation.
A regression model which uses L1 Regularisation technique is called LASSO(Least Absolute Shrinkage and Selection Operator) regression.
A regression model that uses L2 regularisation technique is called Ridge regression.
Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term to the loss function(L).
Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function(L).
NOTE that during Regularisation the output function(y_hat) does not change. The change is only in the loss function.
The output function:
The loss function before regularisation:
The loss function after regularisation:
We define Loss function in Logistic Regression as :
L(y_hat,y) = y log y_hat + (1 - y)log(1 - y_hat)
Loss function with no regularisation :
L = y log (wx + b) + (1 - y)log(1 - (wx + b))
Lets say the data overfits the above function.
Loss function with L1 regularisation :
L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||1
Loss function with L2 regularisation :
L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||22
lambda is a Hyperparameter Known as regularisation constant and it is greater than zero.
lambda > 0