ML | Normal Equation in Linear Regression
Normal Equation is an analytical approach to Linear Regression with a Least Square Cost Function. We can directly find out the value of θ without using Gradient Descent. Following this approach is an effective and time-saving option when working with a dataset with small features. Normal Equation method is based on the mathematical concept of Maxima & Minima in which the derivative and partial derivative of any function would be zero at the minima and maxima point. So, in Normal Equation method, we get the minimum value of the Cost function by finding its partial derivative w.r.t to each weight and equating it to zero.
The normal Equation is as follows:
In the above equation,
θ: hypothesis parameters that define it the best.
X: Input feature value of each instance.
Y: Output value of each instance.
Maths Behind the equation:
Given the hypothesis function
n: the no. of features in the data set.
x0: 1 (for vector multiplication)
Notice that this is a dot product between θ and x values. So for the convenience to solve we can write it as:
The motive in Linear Regression is to minimize the cost function:
xi: the input value of iih training example.
m: no. of training instances
n: no. of data-set features
yi: the expected result of ith instance
Let us represent the cost function in a vector form.
We have ignored 1/2m here as it will not make any difference in the working. It was used for mathematical convenience while calculating gradient descent. But it is no more needed here.
xij: value of jih feature in iih training example.
This can further be reduced to
But each residual value is squared. We cannot simply square the above expression. As the square of a vector/matrix is not equal to the square of each of its values. So to get the squared value, multiply the vector/matrix with its transpose. So, the final equation derived is
Therefore, the cost function is
So, now getting the value of θ using the partial derivative
So, this is the finally derived Normal Equation with θ giving the minimum cost value.
Let’s implement the Normal Equation:
[[ 0.52804151] [30.65896337]]
Try to predict for new data instance:
Before adding x0: [[-2] [ 4]] After adding x0: [[ 1. -2.] [ 1. 4.]]
Plot the output:
Verify the above using sklearn LinearRegression class:
Best value of theta: [0.52804151] [[30.65896337]] predicted value: [[-60.78988524] [123.16389501]]