Open In App

Ordinary Least Squares (OLS) using statsmodels

In this article, we will use Python’s statsmodels module to implement Ordinary Least Squares(OLS) method of linear regression.
Introduction : 
A linear regression model establishes the relation between a dependent variable(y) and at least one independent variable(x) as : 

In OLS method, we have to choose the values of and such that, the total sum of squares of the difference between the calculated and observed values of y, is minimised. 
Formula for OLS:

Where, 
= predicted value for the ith observation 
= actual value for the ith observation 
= error/residual for the ith observation 
n = total number of observations
To get the values of and which minimise S, we can take a partial derivative for each coefficient and equate it to zero.
Modules used : 
 

pip install statsmodels
 
 
 


Approach :
 




 

Syntax : statsmodels.api.OLS(y, x) 
Parameters : 
 



  • y : the variable which is dependent on x
  • x : the independent variable


 


Code: 
 

import statsmodels.api as sm
import pandas as pd
 
# reading data from the csv
data = pd.read_csv('train.csv')
 
# defining the variables
x = data['x'].tolist()
y = data['y'].tolist()
 
# adding the constant term
x = sm.add_constant(x)
 
# performing the regression
# and fitting the model
result = sm.OLS(y, x).fit()
 
# printing the summary table
print(result.summary())

                    

Output : 
 

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.989
Model:                            OLS   Adj. R-squared:                  0.989
Method:                 Least Squares   F-statistic:                 2.709e+04
Date:                Fri, 26 Jun 2020   Prob (F-statistic):          1.33e-294
Time:                        15:55:38   Log-Likelihood:                -757.98
No. Observations:                 300   AIC:                             1520.
Df Residuals:                     298   BIC:                             1527.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.4618      0.360     -1.284      0.200      -1.169       0.246
x1             1.0143      0.006    164.598      0.000       1.002       1.026
==============================================================================
Omnibus:                        1.034   Durbin-Watson:                   2.006
Prob(Omnibus):                  0.596   Jarque-Bera (JB):                0.825
Skew:                           0.117   Prob(JB):                        0.662
Kurtosis:                       3.104   Cond. No.                         120.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.


Description of some of the terms in the table : 
 


Predicting values: 
From the results table, we note the coefficient of x and the constant term. These values are substituted in the original equation and the regression line is plotted using matplotlib
Code: 
 

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
 
# reading data from the csv
data = pd.read_csv('train.csv')
 
# plotting the original values
x = data['x'].tolist()
y = data['y'].tolist()
plt.scatter(x, y)
 
# finding the maximum and minimum
# values of x, to get the
# range of data
max_x = data['x'].max()
min_x = data['x'].min()
 
# range of values for plotting
# the regression line
x = np.arange(min_x, max_x, 1)
 
# the substituted equation
y = 1.0143 * x - 0.4618
 
# plotting the regression line
plt.plot(y, 'r')
plt.show()

                    

Output: 
 


 


Article Tags :