# ML | Mathematical explanation of RMSE and R-squared error

• Last Updated : 06 Jun, 2022

RMSE: Root Mean Square Error is the measure of how well a regression line fits the data points. RMSE can also be construed as Standard Deviation in the residuals.

`Consider the given data points: (1, 1), (2, 2), (2, 3), (3, 6). `

Let us break the above data points into 1-d lists.

Input:

```x = [1, 2, 2, 3]
y = [1, 2, 3, 6]```

Code: Regression Graph

## Python

 `import` `matplotlib.pyplot as plt``import` `math` `# plotting the points ``plt.plot(x, y)` `# naming the x axis``plt.xlabel(``'x - axis'``)` `# naming the y axis``plt.ylabel(``'y - axis'``)` `# giving a title to my graph``plt.title(``'Regression Graph'``)` `# function to show the plot``plt.show()`

Output: Code: Mean Calculation

## Python

 `# in the next step we will find the equation of the best fit line``# we will use Linear algebra's Point slope form to find regression line equation``# point-slope form is represented by y = mx + c``# where m is slope means (change in y) / (change in x)``# c is constant, it represents at which point line will intercept y-axis``# slope m can be formulated as below:``'''``   ``n``m =? (xi - Xmean) (yi - Ymean)/?(xi - Xmean)^2`` ``i = 1``'''``# calculate Xmean and Ymean``ct ``=` `len``(x)``sum_x ``=` `0``sum_y ``=` `0` `for` `i ``in` `x:``    ``sum_x ``=` `sum_x ``+` `i``x_mean ``=` `sum_x ``/` `ct``print``(``'Value of X mean'``, x_mean)` `for` `i ``in` `y:``    ``sum_y ``=` `sum_y ``+` `i``y_mean ``=` `sum_y ``/` `ct``print``(``'value of Y mean'``, y_mean)` `# we have the values of x mean and y_mean`

Output :

```Value of X mean 2.0
value of Y mean 3.0```

Code: Line Equation

## Python

 `# below is the process of finding line equation in mathematical terms``# slope of our line is 2.5``# calculate c to find out the equation` `m ``=` `2.5``c ``=` `y_mean ``-` `m ``*` `x_mean``print``(``'Intercept'``, c)`

Output:

`Intercept -2.0`

Code: Mean Squared Error

## Python

 `# equation of our Regression line comes out to be as below:``# y_pred = 2.5x-2.0``# we call the line y_pred``# paste regression line graph``from` `sklearn.metrics ``import` `mean_squared_error``# y_pred for our exusting data points is as below` `y ``=``[``1``, ``2``, ``3``, ``6``]``y_pred ``=``[``0.5``, ``3``, ``3``, ``5.5``]`

Output: ## Python

 `# root mean square calculated by sklearn package``mse1 ``=` `math.sqrt(mean_squared_error(y, y_pred))``print``(``'Root mean square error'``, mse1)` `# where as the another way to find RMSE``# is by adding squared attribute as false in mean_squared_error``mse2 ``=` `mean_squared_error(y, y_pred, squared``=``False``)``print``(``'Root mean square error'``, mse2)`

Output:

`Root mean square error 0.6123724356957945`

Code : RMSE Calculation

## Python

 `# lets check how the Root mean square is calculated mathematically``# lets introduce a term called residuals``# residual are basically the distance of data point from the regression line``# residuals are denoted by red marked line in below graph``# root mean square and residuals are calculated as below``# we have 4 data points``'''``r = 1, ri = yi-y_pred``y_pred is mx + c``ri = yi-(mx + c)``e.g. x = 1, we have value of y as 1``we want to evaluate what exactly our model has predicted for x = 1``(1, 1)r1 = 1, x = 2``'''``# y_pred1 = 1-(2.5 * 1-2.0)= 0.5``r1 ``=` `1``-``(``2.5` `*` `1``-``2.0``)` `#(2, 2) r2 = 2, x = 2``# y_pred2 = 2-(2.5 * 2-2.0)=-1``r2 ``=` `2``-``(``2.5` `*` `2``-``2.0``)` `#(2, 3) r3 = 3, x = 2``# y_pred3 = 3-(2.5 * 2-2.0)= 0``r3 ``=` `3``-``(``2.5` `*` `2``-``2.0``)` `#(3, 6) r4 = 4, x = 3``# y_pred4 = 6-(2.5 * 3-2.0)=.5``r4 ``=` `6``-``(``2.5` `*` `3``-``2.0``)` `# from above calculation we have values of residuals``residuals ``=``[``0.5``, ``-``1``, ``0``, .``5``]` `# now calculate root mean square error``# N = 4 data points``N ``=` `4``rmse ``=` `math.sqrt((r1``*``*``2` `+` `r2``*``*``2` `+` `r3``*``*``2` `+` `r4``*``*``2``)``/``N)``print``(``'Root Mean square error using maths'``, rmse)` `# root mean square actually calculated using mathematics``# both of RMSE calculated are same`

Output: Output:

`Root Mean square error using maths 0.6123724356957945`

R-squared Error or Coefficient of Determination
R2 error answers the below question.
How much y varies with variation in x.Basically the % variation of y on variation with x Code: R-Squared Error

## Python

 `# SEline =(y1-(mx1 + b)**2 + y2-(mx2 + b)**2...+yn-(mxn + b)**2)``# SE_line =(1-(2.5 * 1+(-2))**2 + (2-(2.5 * 2+(-2))**2) +(3-(2.5*(2)+(-2))**2) + (6-(2.5*(3)+(-2))**2))` `val1 ``=``(``1``-``(``2.5` `*` `1``+``(``-``2``)))``*``*``2``val2 ``=``(``2``-``(``2.5` `*` `2``+``(``-``2``)))``*``*``2``val3 ``=``(``3``-``(``2.5` `*` `2``+``(``-``2``)))``*``*``2``val4 ``=``(``6``-``(``2.5` `*` `3``+``(``-``2``)))``*``*``2``SE_line ``=` `val1 ``+` `val2 ``+` `val3 ``+` `val4``print``(``'val'``, val1, val2, val3, val4)` `# next to calculate total variation in Y from mean value``# variation in y is calculated as``# y_var =(y1-ymean)**2+(y2-ymean)**2...+(yn-ymean)2` `y ``=``[``1``, ``2``, ``3``, ``6``]` `y_var ``=``(``1``-``3``)``*``*``2``+``(``2``-``3``)``*``*``2``+``(``3``-``3``)``*``*``2``+``(``6``-``3``)``*``*``2``SE_mean ``=` `y_var` `# by calculating y_var we are calculating the distance``# between y data points and mean value of y``# so answer to our question, % of the total variation``# of wrt x is denoted as below:``r_squared ``=` `1``-``(SE_line ``/` `SE_mean)` `# [SE_line / SE_mean] -->tells us the what % of variation``# in y is not described by regression line``# 1-(SE_line / SE_mean) --> gives us the exact value of``# how much % y varies with variation in x``print``(``'Rsquared error'``, r_squared)`

Output

```('val', 0.25, 1.0, 0.0, 0.25)
('Rsquared error', 0.8928571428571429)
```

Code: R-Squared Error with sklearn

## Python

 `from` `sklearn.metrics ``import` `r2_score` `# r2 error calculated by sklearn is similar``# to ours mathematically calculated r2 error``# calculate r2 error using sklearn``r2_score(y, y_pred)`

Output:

`0.8928571428571429 `

My Personal Notes arrow_drop_up