Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App
geeksforgeeks
Browser
Continue

Related Articles

R-squared Regression Analysis in R Programming

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

For the prediction of one variable’s value(dependent variable) through other variables (independent variables) some models are used that are called regression models. For further calculating the accuracy of this prediction another mathematical tool is used, which is R-squared Regression Analysis or the coefficient of determination. The value of R-squared is between 0 and 1. And if the coefficient of determination is 1 (or 100%) means that prediction of the dependent variable has been perfect and accurate.

R-square is a comparison of the residual sum of squares (SSres) with the total sum of squares(SStot). The residual sum of squares is calculated by the summation of squares of perpendicular distance between data points and the best-fitted line.

best-fitted-model

The total sum of squares is calculated by the summation of squares of perpendicular distance between data points and the average line.

average-fitted-model1

Formula for R-squared Regression Analysis

The formula for R-squared Regression Analysis is given as follows,

r^2 =  \frac{\sum  ( \widehat{y_i} -  \bar  y)^2}{\sum  (y_i - \bar  y )^2}

where,
y_i: experimental values of the dependent variable
\bar y: the average/mean
\widehat{y_i}: the fitted value

Find the Coefficient of Determination(R) in R

It is very easy to find out the Coefficient of Determination(R) in the R language. The steps to follow are:

  • Make a data frame in R.
  • Calculate the linear regression model and save it in a new variable.
  • The so calculated new variable’s summary has a coefficient of determination or R-squared parameter that needs to be extracted.




# Creating a data frame of exam marks
exam <- data.frame(name = c("ravi", "shaily"
                            "arsh", "monu"),
                   math = c(87, 98, 67, 90),
                   estimated = c(65, 87, 56, 100))
  
# Printing data frame
exam
  
# Calculating the linear regression model
model = lm(math~estimated, data = exam)
  
# Extracting R-squared parameter from summary
summary(model)$r.squared

Output:

    name   math   estimated
1   ravi   87        65
2 shaily   98        87
3   arsh   67        56
4   monu   90       100

[1] 0.5672797

Note: If the prediction is accurate the R-squared Regression value generated is 1.




# Creating a data frame of exam marks
exam <- data.frame(name = c("ravi", "shaily",
                             "arsh", "monu"), 
                   math = c(87, 98, 67, 90),
                   estimated = c(87, 98, 67, 90))
  
# Printing data frame
exam
  
# Calculating the linear regression model
model = lm(math~estimated, data = exam)
  
# Extracting R-squared parameter from summary
summary(model)$r.squared

Output:

    name   math   estimated
1   ravi   87        87
2 shaily   98        98
3   arsh   67        67
4   monu   90       90

[1] 1

Limitation of Using R-square Method

  • The value of r-square always increases or remains the same as new variables are added to the model, without detecting the significance of this newly added variable (i.e value of r-square never decreases on the addition of new attributes to the model). As a result, non-significant attributes can also be added to the model with an increase in r-square value.
  • This is because SStot is always constant and the regression model tries to decrease the value of SSres by finding some correlation with this new attribute and hence the overall value of r-square increases, which can lead to a poor regression model.

My Personal Notes arrow_drop_up
Last Updated : 28 Jul, 2020
Like Article
Save Article
Similar Reads
Related Tutorials