Open In App

Standard Error of the Regression vs. R-squared

Last Updated : 06 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Regression is a statistical technique used to establish a relationship between dependent and independent variables. It predicts a continuous set of values in a given range. The general equation of Regression is given by

y=mx+c

  • Here y is the dependent variable. It is the variable whose value changes when the independent values are changed
  • x is the independent variable. Here y is dependent on x. It is to be noted that there can be more than one independent variable.
  • m is the slope
  • c is the y-intercept

There are different types of Regression: Linear Regression, Ridge Regression, Polynomial Regression, and Lasso Regression. Regression analysis involves the prediction of continuous values within a given range therefore we require evaluation metrics. Evaluation metrics help us to analyze the performance of the Machine Learning model. In Regression Analysis, we calculate how much the predicted values deviate from the actual values. There are different evaluation metrics for Regression Analysis like Mean Squared Error, Mean Absolute Error, R squared, etc.

Standard Error of Regression

Standard error is a statistical technique that is used to find the average distance between the observed values and the regression line. It defines how much the actual data is spread around the line. In other words, it can be said that it provides a measure of how much the actual dependent value deviates from the predicted value. Since it is an error, therefore lower the value better will be our prediction.

Suppose we want to estimate the slope as well as the Y-intercept using the independent variables and dependent variables. Let us look at the Python code.

Python3

import numpy as np
import statsmodels.api as sm
 
 
x = np.array([2, 3, 4, 9, 5])  # Independent variable
y = np.array([2, 6, 8, 18, 10])  # Dependent variable
 
# Add a constant term (intercept) to the independent variable
x = sm.add_constant(x, prepend=True)
 
# Fitting a linear regression model
regression = sm.OLS(y, x).fit()
 
 
regression.summary()

                    

Output:

OLS Regression Results
Dep. Variable: y R-squared: 0.984
Model: OLS Adj. R-squared: 0.978
Method: Least Squares F-statistic: 182.8
Date: Thu, 05 Oct 2023 Prob (F-statistic): 0.000875
Time: 11:18:32 Log-Likelihood: -5.1249
No. Observations: 5 AIC: 14.25
Df Residuals: 3 BIC: 13.47
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -1.2192 0.837 -1.456 0.241 -3.883 1.445
x1 2.1781 0.161 13.519 0.001 1.665 2.691
Omnibus: nan Durbin-Watson: 2.045
Prob(Omnibus): nan Jarque-Bera (JB): 0.624
Skew: -0.678 Prob(JB): 0.732
Kurtosis: 1.925 Cond. No. 11.5


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Standard Error:

Python3

# Get the standard errors of regression coefficients
error = regression.bse
 
# Print the standard errors
print('Constant error :',error[0])
print('x1 error       :',error[1])

                    

Output:

Constant error : 0.8371869364710968
x1 error       : 0.16111670104454195

After executing the code we get a list comprising two values. In the above code we have defined two numpy arrays. For statistical calculations we have the imported stats model library. After that we specified a constant term and theen used a linear regression model to fit the independent values and the dependent values. The model analyses the values and establishes a relationship between the dependent and indevariablesvariable. So basically the model estimates the slope y-interceptntercept. Then using the bse we are estimating the standard errors of the intercept and the slope respectively. Here the inhass has a standard error of 0.837 whereas it has a standard error of 0.161.

R – squared

R squared, also known as the coefficient of determination, is a statistical measure that determines how well the regression line fits the values. In other words it explains the variability between independent and dependent variables. The basic idea of R square is to provide information about the relationship between independent and dependent variable. Therefore higher the value of R2 greater the relationship between independent and dependent variables.

R squared can be calculated using either statsmodel libthe rary or Scikit learn library. Let us look at each of them

Python3

# Get the r square value
r_squared = regression.rsquared
 
# Print
print('R-Squared :',r_squared)

                    

Output:

R-Squared : 0.9838496264009963

I,n the above code we have defined two numpy arrays. For statistical calculations we have imported statsmodel library. After that we specified a constant term and then used linear regression model to fit the independent values and the dependent values. The model analyses the values and establishes a relationship between the dependent and independent variable. Now to determine the strength of relationship we use squared to calculate the coefficient of determine action

Interpreting standard error vs. R-squared

Standard Error tells us on an average basis how much the data is spread around values found out on regression line. Therefore lower the distance between the actual data and the regression line, it means the model is gaining more knowledge about the data. R squared on the other hand focuses on the strength of relationship between dependent and independent variables. It measures the variability of dependent variables when the independent variables changes. R squared lies between 0 and 1. 0 means that the independent variable cannot predict the variability in dependent variable while 1 denotes that variability can be predicted by the independent variable.

Using Both Metrics Together

These metrics can be used to identify the overfitting of data, comparing two models etc. Let us have a look at each of them:

Overfitting is an issue that occurs when the model gives correct prediction for training data but provides incorrect prediction for low data. In Regression models overfitting can occur. These two metrics can be useful to determine the same. If a model has very high R squared value but very low standard error, then it can be said that the model is suffering from overfitting.

Comparison of two models: Since these two are evaluation metrics they can act as benchmarks for comparison of two or more regression models. This helps users to choose the which regression model provides better results

When to use standard error and R-squared?

Both metrics play an important role in the analysis of Regression model. Standard error is used when we want to measure the spread of datapoints around the regression line. R squared is used when we want the goodness of fit. However if both are combined then they can be used to check whether the model is overfitting or not.

Choosing Right Metric

All the metrics play an important role in evaluation of regression model. Each of the metric has its own specific role in the analysis. R- squared, also known as coefficient determination, is used to predict the strength of relationship between two variables. Standard error is nothing but standard deviation is statistics that helps to measure the spread of data around regression model. Hence we should use this metric only when we want to measure the spread of original data around the predicted data.

Limitations

The main drawback of Coefficient of determination is that it cannot be used to predict whether overall the model is good or bad. To evaluate whether the model is good or bad we should use other metrics like MSE, MAE etc. On the other hand, the main limitation of Standard Error of the Regression is that it assumes that there is linear relationship between the variables and may not give accurate values for nonlinear relationship.

Standard Error of the Regression vs. R-squared

Characterists

R Squared

Standard Error

Objective

The objective of R squared is to determine how much is the strength of relationship between Independent and Dependent variables

The objective of this metric is to find the average distance between actual values and the predicted values through which the regression line passes.

Measurement Focus

R-Squared lies between 0 and 1.


Standard error does not lie between 0 and 1.

Purpose and use

R2-squared is used for comparison between two models.

Helps to determine the precision of predictions

Interpretation

Higher the value more is the variability between dependent and independent variable.

Lower the value better is the model.

Relationship Between the Two Metrics

There is no formula based relationship but as one decreases the other one increases and vice versa

There is no formula based relationship but as one decreases the other one increases and vice versa



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads