Root mean squared error (RMSE) is the square root of the mean of the square of all of the error. RMSE is considered an excellent general-purpose error metric for numerical predictions. RMSE is a good measure of accuracy, but only to compare prediction errors of different models or model configurations for a particular variable and not between variables, as it is scale-dependent. It is the measure of how well a regression line fits the data points. The formula for calculating RMSE is:

where,
predictedi = The predicted value for the ith observation.
actuali = The observed(actual) value for the ith observation
N = Total number of observations.
Note: The difference between the actual values and the predicted values is known as residuals.
Implementation of RMSE
The rmse()
function available in Metrics
package in R is used to calculate root mean square error between actual values and predicted values.
Syntax:
rmse(actual, predicted)
Parameters:
actual: The ground truth numeric vector.
predicted: The predicted numeric vector, where each element in the vector is a prediction for the corresponding element in actual.
Example 1:
Let’s define two vectors actual vector with ground truth numeric values and predicted vector with predicted numeric values where each element in the vector is a prediction for the corresponding element in actual.
library(Metrics)
actual = c( 1.5 , 1.0 , 2.0 , 7.4 , 5.8 , 6.6 )
predicted = c( 1.0 , 1.1 , 2.5 , 7.3 , 6.0 , 6.2 )
result = rmse(actual, predicted)
print (result)
|
Output:
[1] 0.3464102
Example 2:
In this example let’s take the trees data in the datasets library which represents the data from a study conducted on black cherry trees.
library (datasets)
library (tidyr)
library (dplyr)
data (trees)
trees
|
Output:
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
6 10.8 83 19.7
7 11.0 66 15.6
8 11.0 75 18.2
9 11.1 80 22.6
10 11.2 75 19.9
11 11.3 79 24.2
12 11.4 76 21.0
13 11.4 76 21.4
14 11.7 69 21.3
15 12.0 75 19.1
16 12.9 74 22.2
17 12.9 85 33.8
18 13.3 86 27.4
19 13.7 71 25.7
20 13.8 64 24.9
21 14.0 78 34.5
22 14.2 80 31.7
23 14.5 74 36.3
24 16.0 72 38.3
25 16.3 77 42.6
26 17.3 81 55.4
27 17.5 82 55.7
28 17.9 80 58.3
29 18.0 80 51.5
30 18.0 80 51.0
31 20.6 87 77.0
Output:
'data.frame': 31 obs. of 3 variables:
$ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
$ Height: num 70 65 63 72 81 83 66 75 80 75 ...
$ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
This data set consists of 31 observations of 3 numeric variables describing black cherry trees with trunk girth, height and volume as variables.Now, try to fit a linear regression model to predict Volume of the trunks on the basis of given trunk girth. The Simple Liner Regression Model in R will help in this case. Let’s dive right in and build a linear model relating tree volume to girth. R makes this straightforward with the base function lm()
. How well will the model do at predicting that tree’s volume from its girth? Use the predict()
function, a generic R function for making predictions of model-fitting functions. predict()
takes as arguments, the linear regression model and the values of the predictor variable that we want response variable values for.
fit_1 <- lm (Volume ~ Girth, data = trees)
trees.Girth = trees %>% select (Girth)
data.predicted = c ( predict (fit_1, data.frame (Girth = trees.Girth)))
data.predicted
|
Output:
1 2 3 4 5 6 7 8 9
5.103149 6.622906 7.636077 16.248033 17.261205 17.767790 18.780962 18.780962 19.287547
10 11 12 13 14 15 16 17 18
19.794133 20.300718 20.807304 20.807304 22.327061 23.846818 28.406089 28.406089 30.432431
19 20 21 22 23 24 25 26 27
32.458774 32.965360 33.978531 34.991702 36.511459 44.110244 45.630001 50.695857 51.709028
28 29 30 31
53.735371 54.241956 54.241956 67.413183
Now we have the actual volume of cherry tree trunks and the predicted one as driven by the linear regression models. Finally use rmse()
function to get the relative error between the actual and the predicted values.
library (Metrics)
rmse (trees$Volume, predict (fit_1, data.frame (Girth = trees.Girth)))
|
Output:
[1] 4.11254
As the error value is 4.11254 which is a good score for a linear model. But it can be reduced further by adding more predictors(Multiple Regression Model). So, in summary, it can be said that it is very easy to find the root mean square error using R. One can perform this task using rmse()
function in R.