Open In App

Root-Mean-Square Error in R Programming

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

Root mean squared error (RMSE) is the square root of the mean of the square of all of the error. RMSE is considered an excellent general-purpose error metric for numerical predictions. RMSE is a good measure of accuracy, but only to compare prediction errors of different models or model configurations for a particular variable and not between variables, as it is scale-dependent. It is the measure of how well a regression line fits the data points. The formula for calculating RMSE is:
RMSE-formula

where,
predictedi = The predicted value for the ith observation.
actuali = The observed(actual) value for the ith observation
N = Total number of observations.

Note: The difference between the actual values and the predicted values is known as residuals.

Implementation of RMSE

The rmse() function available in Metrics package in R is used to calculate root mean square error between actual values and predicted values.

Syntax:
rmse(actual, predicted)

Parameters:
actual: The ground truth numeric vector.
predicted: The predicted numeric vector, where each element in the vector is a prediction for the corresponding element in actual.

Example 1:
Let’s define two vectors actual vector with ground truth numeric values and predicted vector with predicted numeric values where each element in the vector is a prediction for the corresponding element in actual.




# R program to illustrate RMSE
  
# Importing the required package
library(Metrics)
  
# Taking two vectors
actual = c(1.5, 1.0, 2.0, 7.4, 5.8, 6.6)         
predicted = c(1.0, 1.1, 2.5, 7.3, 6.0, 6.2)      
  
# Calculating RMSE using rmse()         
result = rmse(actual, predicted)
  
# Printing the value
print(result)       


Output:

[1] 0.3464102

Example 2:
In this example let’s take the trees data in the datasets library which represents the data from a study conducted on black cherry trees.




# Importing required packages
library(datasets)
library(tidyr)
library(dplyr)
  
# Access the data from R’s datasets package
data(trees)
  
# Display the data in the trees dataset    
trees           


Output:

    Girth Height Volume
1    8.3     70   10.3
2    8.6     65   10.3
3    8.8     63   10.2
4   10.5     72   16.4
5   10.7     81   18.8
6   10.8     83   19.7
7   11.0     66   15.6
8   11.0     75   18.2
9   11.1     80   22.6
10  11.2     75   19.9
11  11.3     79   24.2
12  11.4     76   21.0
13  11.4     76   21.4
14  11.7     69   21.3
15  12.0     75   19.1
16  12.9     74   22.2
17  12.9     85   33.8
18  13.3     86   27.4
19  13.7     71   25.7
20  13.8     64   24.9
21  14.0     78   34.5
22  14.2     80   31.7
23  14.5     74   36.3
24  16.0     72   38.3
25  16.3     77   42.6
26  17.3     81   55.4
27  17.5     82   55.7
28  17.9     80   58.3
29  18.0     80   51.5
30  18.0     80   51.0
31  20.6     87   77.0




# Look at the structure
# Of the variables
str(trees)     


Output:

'data.frame':   31 obs. of  3 variables:
 $ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
 $ Height: num  70 65 63 72 81 83 66 75 80 75 ...
 $ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

This data set consists of 31 observations of 3 numeric variables describing black cherry trees with trunk girth, height and volume as variables.Now, try to fit a linear regression model to predict Volume of the trunks on the basis of given trunk girth. The Simple Liner Regression Model in R will help in this case. Let’s dive right in and build a linear model relating tree volume to girth. R makes this straightforward with the base function lm(). How well will the model do at predicting that tree’s volume from its girth? Use the predict() function, a generic R function for making predictions of model-fitting functions. predict() takes as arguments, the linear regression model and the values of the predictor variable that we want response variable values for.




# Building a linear model 
# Relating tree volume to girth
fit_1 <- lm(Volume ~ Girth, data = trees)                            
trees.Girth = trees %>% select(Girth) 
  
# Use predict function to predict volume
data.predicted = c(predict(fit_1, data.frame(Girth = trees.Girth)))    
data.predicted


Output:

        1         2         3         4         5         6         7         8         9 
 5.103149  6.622906  7.636077 16.248033 17.261205 17.767790 18.780962 18.780962 19.287547 
       10        11        12        13        14        15        16        17        18 
19.794133 20.300718 20.807304 20.807304 22.327061 23.846818 28.406089 28.406089 30.432431 
       19        20        21        22        23        24        25        26        27 
32.458774 32.965360 33.978531 34.991702 36.511459 44.110244 45.630001 50.695857 51.709028 
       28        29        30        31 
53.735371 54.241956 54.241956 67.413183 

Now we have the actual volume of cherry tree trunks and the predicted one as driven by the linear regression models. Finally use rmse() function to get the relative error between the actual and the predicted values.




# Load the Metrics package 
library(Metrics)
  
# Applying rmse() function 
rmse(trees$Volume, predict(fit_1, data.frame(Girth = trees.Girth)))


Output:

[1] 4.11254

As the error value is 4.11254 which is a good score for a linear model. But it can be reduced further by adding more predictors(Multiple Regression Model). So, in summary, it can be said that it is very easy to find the root mean square error using R. One can perform this task using rmse() function in R.



Last Updated : 22 Jul, 2020
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads