Skip to content
Related Articles

Related Articles

ML | Normal Equation in Linear Regression

Improve Article
Save Article
Like Article
  • Difficulty Level : Medium
  • Last Updated : 20 Aug, 2021

Normal Equation is an analytical approach to Linear Regression with a Least Square Cost Function. We can directly find out the value of θ without using Gradient Descent. Following this approach is an effective and time-saving option when are working with a dataset with small features. 
Normal Equation is a follows : 

In the above equation, 
θ: hypothesis parameters that define it the best. 
X: Input feature value of each instance. 
Y: Output value of each instance. 

Maths Behind the equation –

Given the hypothesis function 

n: the no. of features in the data set. 
x0: 1 (for vector multiplication) 
Notice that this is a dot product between θ and x values. So for the convenience to solve we can write it as : 

The motive in Linear Regression is to minimize the cost function

J(\Theta) = \frac{1}{2m} \sum_{i = 1}^{m} \frac{1}{2} [h_{\Theta}(x^{(i)}) – y^{(i)}]^{2} 

xi: the input value of iih training example. 
m: no. of training instances 
n: no. of data-set features 
yi: the expected result of ith instance 
Let us representing the cost function in a vector form. 

we have ignored 1/2m here as it will not make any difference in the working. It was used for mathematical convenience while calculation gradient descent. But it is no more needed here. 


xij: value of jih feature in iih training example. 
This can further be reduced to 

X\theta - y

But each residual value is squared. We cannot simply square the above expression. As the square of a vector/matrix is not equal to the square of each of its values. So to get the squared value, multiply the vector/matrix with its transpose. So, the final equation derived is 



Therefore, the cost function is 



So, now getting the value of θ using derivative 





















So, this is the finally derived Normal Equation with θ giving the minimum cost value.





# This code may not run on GFG IDE
# as required modules not found.
# import required modules
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
# Create data set.
x,y=make_regression(n_samples=100,n_features=1,n_informative=1,noise = 10,random_state=10)
# Plot the generated data set.
plt.xlabel("Feature_1 --->")
plt.ylabel("Target_Variable --->")
plt.title('Simple Linear Regression')
# Convert  target variable array from 1d to 2d.

Let’s implement  the Normal Equation:


# code
# Adding x0=1 to each instance
# Using Normal Equation.
# Display best values obtained.
[[ 0.52804151]

Try to predict for new data instance:


# code
# sample data instance.
# Adding x0=1 to each instance.
# Display the sample.
print("Before adding x0:\n",x_sample)
print("After adding x0:\n",x_sample_new)
Before adding x0:
 [ 4]]
After adding x0:
 [[ 1. -2.]
 [ 1.  4.]]


# code
# predict the values for given data instance.

Plot the output:


# code
# Plot the output.
plt.xlabel("Feature_1 --->")
plt.ylabel("Target_Variable --->")
plt.title('Simple Linear Regression')

Verify the above using sklearn LinearRegression class:


# code
# Verification.
from sklearn.linear_model import LinearRegression
lr=LinearRegression()    # Object.,y)              # fit method.
# Print obtained theta values.
print("Best value of theta:",lr.intercept_,lr.coef_,sep='\n')
print("predicted value:",lr.predict(x_sample),sep='\n')
Best value of theta:

predicted value:

My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!