ML | R-squared in Regression Analysis
R-squared is a statistical measure that represents the goodness of fit of a regression model. The ideal value for r-square is 1. The closer the value of r-square to 1, the better is the model fitted.
R-square is a comparison of the residual sum of squares (SSres) with the total sum of squares(SStot). The total sum of squares is calculated by summation of squares of perpendicular distance between data points and the average line.
The residual sum of squares is calculated by the summation of squares of perpendicular distance between data points and the best-fitted line.
R square is calculated by using the following formula :
Where SSres is the residual sum of squares and SStot is the total sum of squares.
The goodness of fit of regression models can be analyzed on the basis of the R-square method. The more the value of r-square near 1, the better is the model.
Note: The value of R-square can also be negative when the model fitted is worse than the average fitted model.
Limitation of using the R-square method –
- The value of r-square always increases or remains the same as new variables are added to the model, without detecting the significance of this newly added variable (i.e value of r-square never decreases on the addition of new attributes to the model). As a result, non-significant attributes can also be added to the model with an increase in the r-square value.
- This is because SStot is always constant and the regression model tries to decrease the value of SSres by finding some correlation with this new attribute hence the overall value of r-square increases, which can lead to a poor regression model.
Please Login to comment...