Regularization in Machine Learning

Prerequsites: Gradient Descent

Often times, a regression model overfits to the data it is training upon. The primary reasons of overfitting are given here. Using the process of regularisation, we try to reduce the complexity of the regression function without actually reducing the degree of the underlying polynomial function.

This technique is based on the fact that if the highest order terms in a polynomial equation have very small coefficients, then the function will approximately behave like a polynomial function of a smaller degree.

Typically, regularisation is done by adding a complexity term to the cost function which will give a higher cost as the complexity of the underlying polynomial function increases.

J(\theta) = \sum _{m}\(theta^TX-y)^{2} + \lambda\theta^{2}

The formula is given in matrix form. The squared terms represent the squaring of each element of the matrix. This is the most widely used formula but is not the only one.

Regularised regressions are categorized on the basis of the complexity terms added to the cost function.

Regression Complexity Term
Ridge  \lambda\theta^{2}
Lasso  \lambda\theta
My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using or mail your article to See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.

Article Tags :
Practice Tags :


Please write to us at to report any issue with the above content.