**Normal Equation** is an analytical approach to Linear Regression with a Least Square Cost Function. We can directly find out the value of θ without using Gradient Descent. Following this approach is an effective and a time-saving option when are working with a dataset with small features.

**Normal Equation is a follows :**

In the above equation,

**θ :**hypothesis parameters that define it the best.

**X :**Input feature value of each instance.

**Y :**Output value of each instance.

#### Maths Behind the equation –

Given the hypothesis function

where,

**n :**the no. of features in the data set.

**x**1 (for vector multiplication)

_{0}:Notice that this is the dot product between θ and x values. So for the convenience to solve we can write it as :

The motive in Linear Regression is to minimize the

**cost function**:

where,**x ^{i} :** the input value of i

^{ih}training example.

**m :**no. of training instances

**n :**no. of data-set features

**y**the expected result of i

^{i}:^{th}instance

Let us representing cost function in a vector form.

we have ignored 1/2m here as it will not make any difference in the working. It was used for the mathematical convenience while calculation gradient descent. But it is no more needed here.

**x ^{i}_{j} :** value of j

^{ih}feature in i

^{ih}training example.

This can further be reduced to

But each residual value is squared. We cannot simply square the above expression. As the square of a vector/matrix is not equal to the square of each of its values. So to get the squared value, multiply the vector/matrix with its transpose. So, the final equation derived is

Therefore, the cost function is

So, now getting the value of θ using derivative

(1)

So, this is the finally derived

**Normal Equation with θ giving the minimum cost value.**