**Normal Equation** is an analytical approach to Linear Regression with a Least Square Cost Function. We can directly find out the value of θ without using Gradient Descent. Following this approach is an effective and a time-saving option when are working with a dataset with small features.

**Normal Equation is a follows :**

In the above equation,

**θ :** hypothesis parameters that define it the best.

**X :** Input feature value of each instance.

**Y :** Output value of each instance.

#### Maths Behind the equation –

Given the hypothesis function

where,

**n :** the no. of features in the data set.

**x _{0} :** 1 (for vector multiplication)

Notice that this is dot product between θ and x values. So for the convenience to solve we can write it as :

The motive in Linear Regression is to minimize the **cost function** :

where,

**x ^{i} :** the input value of i

^{ih}training example.

**m :**no. of training instances

**n :**no. of data-set features

**y**the expected result of i

^{i}:^{th}instance

Let us representing cost function in a vector form.

we have ignored 1/2m here as it will not make any difference in the working. It was used for the mathematical convenience while calculation gradient descent. But it is no more needed here.

**x ^{i}_{j} :** value of j

^{ih}feature in i

^{ih}training example.

This can further be reduced to

But each residual value is squared. We cannot simply square the above expression. As the square of a vector/matrix is not equal to the square of each of its values. So to get the squared value, multiply the vector/matrix with its transpose. So, the final equation derived is

Therefore, the cost function is

So, now getting the value of θ using derivative

So, this is the finally derived **Normal Equation with θ giving the minimum cost value.**

Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.

## Recommended Posts:

- ML | Linear Regression vs Logistic Regression
- Difference between Gradient descent and Normal equation
- Linear Regression (Python Implementation)
- Multiple Linear Regression using R
- Linear Regression using PyTorch
- Simple Linear-Regression using R
- Linear Regression Using Tensorflow
- ML | Linear Regression
- Gradient Descent in Linear Regression
- Mathematical explanation for Linear Regression working
- ML | Boston Housing Kaggle Challenge with Linear Regression
- ML | Locally weighted Linear Regression
- ML | Multiple Linear Regression using Python
- ML | Rainfall prediction using Linear regression
- A Practical approach to Simple Linear Regression using R
- Python | Linear Regression using sklearn
- Univariate Linear Regression in Python
- Pyspark | Linear regression using Apache MLlib
- ML | Multiple Linear Regression (Backward Elimination Technique)
- Pyspark | Linear regression with Advanced Feature Dataset using Apache MLlib

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.