Open In App

R-squared Regression Analysis in R Programming

For the prediction of one variable’s value(dependent variable) through other variables (independent variables) some models are used that are called regression models. For further calculating the accuracy of this prediction another mathematical tool is used, which is R-squared Regression Analysis or the coefficient of determination. The value of R-squared is between 0 and 1. If the coefficient of determination is 1 (or 100%) means that the prediction of the dependent variable has been perfect and accurate.

R-square is a comparison of the residual sum of squares (SSres) with the total sum of squares(SStot). The residual sum of squares is calculated by the summation of squares of perpendicular distance between data points and the best-fitted line.



The total sum of squares is calculated by the summation of squares of perpendicular distance between data points and the average line.

Formula for R-squared Regression Analysis

The formula for R-squared Regression Analysis is given as follows,



where, : experimental values of the dependent variable : the average/mean : the fitted value

Find the Coefficient of Determination(R) in R


It is very easy to find out the Coefficient of Determination(R) in the R Programming Language.

The steps to follow are:

# Creating a data frame of exam marks
exam <- data.frame(name = c("ravi", "shaily",
                            "arsh", "monu"),
                math = c(87, 98, 67, 90),
                estimated = c(65, 87, 56, 100))
 
# Printing data frame
exam

                    

Output:

    name math estimated
1 ravi 87 65
2 shaily 98 87
3 arsh 67 56
4 monu 90 100

Make model and calculate the summary of the model

# Calculating the linear regression model
model = lm(math~estimated, data = exam)
   
# Extracting R-squared parameter from summary
summary(model)

                    

Output:

Call:
lm(formula = math ~ estimated, data = exam)
Residuals:
1 2 3 4
7.421 7.566 -8.138 -6.848
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.5074 24.0563 1.975 0.187
estimated 0.4934 0.3047 1.619 0.247
Residual standard error: 10.62 on 2 degrees of freedom
Multiple R-squared: 0.5673, Adjusted R-squared: 0.3509
F-statistic: 2.622 on 1 and 2 DF, p-value: 0.2468
# Extracting R-squared parameter from summary
summary(model)$r.squared

                    

Output:

[1] 0.5672797

Note: If the prediction is accurate the R-squared Regression value generated is 1.

# Creating a data frame of exam marks
exam <- data.frame(name = c("ravi", "shaily",
                            "arsh", "monu"),
                math = c(87, 98, 67, 90),
                estimated = c(87, 98, 67, 90))
 
# Printing data frame
exam
 
# Calculating the linear regression model
model = lm(math~estimated, data = exam)
 
# Extracting R-squared parameter from summary
summary(model)

                    

Output:

    name math estimated
1 ravi 87 87
2 shaily 98 98
3 arsh 67 67
4 monu 90 90
Call:
lm(formula = math ~ estimated, data = exam)
Residuals:
1 2 3 4
2.618e-15 -2.085e-15 -1.067e-15 5.330e-16
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.000e+00 9.494e-15 0.000e+00 1
estimated 1.000e+00 1.101e-16 9.086e+15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.512e-15 on 2 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 8.256e+31 on 1 and 2 DF, p-value: < 2.2e-16

The summary function give all the information of the model.

Residual standard error: 2.512e-15 on 2 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 8.256e+31 on 1 and 2 DF, p-value: < 2.2e-16

This model appears to be an impeccable fit for the data. The extremely tiny residual standard error indicates almost perfect prediction accuracy.

Limitation of Using R-square Method



Article Tags :