# ML | Mathematical explanation of RMSE and R-squared error

**RMSE: Root Mean Square Error** is the measure of how well a regression line fits the data points. RMSE can also be construed as Standard Deviation in the residuals.

Consider the given data points: (1, 1), (2, 2), (2, 3), (3, 6).

Lets break the above data points into 1-d lists.

**Input :**

x =[1, 2, 2, 3]y =[1, 2, 3, 6]

**Code : Regression Graph**

`import` `matplotlib.pyplot as plt ` `import` `math ` ` ` `# plotting the points ` `plt.plot(x, y) ` ` ` `# naming the x axis ` `plt.xlabel(` `'x - axis'` `) ` ` ` `# naming the y axis ` `plt.ylabel(` `'y - axis'` `) ` ` ` `# giving a title to my graph ` `plt.title(` `'Regression Graph'` `) ` ` ` `# function to show the plot ` `plt.show() ` |

*chevron_right*

*filter_none*

Code: Mean Calculation

`# in the next step we will find the equation of the best fit line ` `# we will use Linear algebra's Point slope form to find regression line equation ` `# point-slope form is represented by y = mx + c ` `# where m is slope means (change in y) / (change in x) ` `# c is constant, it represents at which point line will intercept y-axis ` `# slope m can be formulated as below: ` `''' ` ` ` `n ` `m =? (xi - Xmean) (yi - Ymean)/?(xi - Xmean)^2 ` ` ` `i = 1 ` `'''` `# calculate Xmean and Ymean ` `ct ` `=` `len` `(x) ` `sum_x ` `=` `0` `sum_y ` `=` `0` ` ` `for` `i ` `in` `x: ` ` ` `sum_x ` `=` `sum_x ` `+` `i ` `x_mean ` `=` `sum_x ` `/` `ct ` `print` `(` `'Value of X mean'` `, x_mean) ` ` ` `for` `i ` `in` `y: ` ` ` `sum_y ` `=` `sum_y ` `+` `i ` `y_mean ` `=` `sum_y ` `/` `ct ` `print` `(` `'value of Y mean'` `, y_mean) ` ` ` `# we have the values of x mean and y_mean ` |

*chevron_right*

*filter_none*

**Output : **

Value of X mean 2.0 value of Y mean 3.0

**Code : Line Equation**

`# below is the process of finding line equation in mathematical terms ` `# slope of our line is 2.5 ` `# calculate c to find out the equation ` ` ` `m ` `=` `2.5` `c ` `=` `y_mean ` `-` `m ` `*` `x_mean ` `print` `(` `'Intercept'` `, c) ` |

*chevron_right*

*filter_none*

**Output : **

Intercept -2.0

**Code : Mean Squared Error**

`# equation of our Regression line comes out to be as below: ` `# y_pred = 2.5x-2.0 ` `# we call the line y_pred ` `# paste regression line graph ` `from` `sklearn.metrics ` `import` `mean_squared_error ` `# y_pred for our exusting data points is as below ` ` ` `y ` `=` `[` `1` `, ` `2` `, ` `3` `, ` `6` `] ` `y_pred ` `=` `[` `0.5` `, ` `3` `, ` `3` `, ` `5.5` `] ` |

*chevron_right*

*filter_none*

`# root mean square calculated by sklearn package ` `mse ` `=` `math.sqrt(mean_squared_error(y, y_pred)) ` `print` `(` `'Root mean square error'` `, mse) ` |

*chevron_right*

*filter_none*

**Output : **

Root mean square error 0.6123724356957945

**Code : RMSE Calculation**

`# lets check how the Root mean square is calculated mathematically ` `# lets introduce a term called residuals ` `# residual are basically the distance of data point from the regression line ` `# residuals are denoted by red marked line in below graph ` `# root mean square and residuals are calculated as below ` `# we have 4 data points ` `''' ` `r = 1, ri = yi-y_pred ` `y_pred is mx + c ` `ri = yi-(mx + c) ` `e.g. x = 1, we have value of y as 1 ` `we want to evaluate what exactly our model has predicted for x = 1 ` `(1, 1)r1 = 1, x = 2 ` `'''` `# y_pred1 = 1-(2.5 * 1-2.0)= 0.5 ` `r1 ` `=` `1` `-` `(` `2.5` `*` `1` `-` `2.0` `) ` ` ` `#(2, 2) r2 = 2, x = 2 ` `# y_pred2 = 2-(2.5 * 2-2.0)=-1 ` `r2 ` `=` `2` `-` `(` `2.5` `*` `2` `-` `2.0` `) ` ` ` `#(2, 3) r3 = 3, x = 2 ` `# y_pred3 = 3-(2.5 * 2-2.0)= 0 ` `r3 ` `=` `3` `-` `(` `2.5` `*` `2` `-` `2.0` `) ` ` ` `#(3, 6) r4 = 4, x = 3 ` `# y_pred4 = 6-(2.5 * 3-2.0)=.5 ` `r4 ` `=` `6` `-` `(` `2.5` `*` `3` `-` `2.0` `) ` ` ` `# from above calculation we have values of residuals ` `residuals ` `=` `[` `0.5` `, ` `-` `1` `, ` `0` `, .` `5` `] ` ` ` `# now calculate root mean square error ` `# N = 4 data points ` `N ` `=` `4` `rmse ` `=` `math.sqrt((r1` `*` `*` `2` `+` `r2` `*` `*` `2` `+` `r3` `*` `*` `2` `+` `r4` `*` `*` `2` `)` `/` `N) ` `print` `(` `'Root Mean square error using maths'` `, rmse) ` ` ` `# root mean square actually calculated using mathematics ` `# both of RMSE calculated are same ` |

*chevron_right*

*filter_none*

**Output : **

Root Mean square error using maths 0.6123724356957945

R-squared Error or Coefficient of Determination

R2 error answers the below question.

How much y varies with variation in x.Basically the % variation of y on variation with x

**Code : R-Squared Error **

`# SEline =(y1-(mx1 + b)**2 + y2-(mx2 + b)**2...+yn-(mxn + b)**2) ` `# SE_line =(1-(2.5 * 1+(-2))**2 + (2-(2.5 * 2+(-2))**2) +(3-(2.5*(2)+(-2))**2) + (6-(2.5*(3)+(-2))**2)) ` ` ` `val1 ` `=` `(` `1` `-` `(` `2.5` `*` `1` `+` `(` `-` `2` `)))` `*` `*` `2` `val2 ` `=` `(` `2` `-` `(` `2.5` `*` `2` `+` `(` `-` `2` `)))` `*` `*` `2` `val3 ` `=` `(` `3` `-` `(` `2.5` `*` `2` `+` `(` `-` `2` `)))` `*` `*` `2` `val4 ` `=` `(` `6` `-` `(` `2.5` `*` `3` `+` `(` `-` `2` `)))` `*` `*` `2` `SE_line ` `=` `val1 ` `+` `val2 ` `+` `val3 ` `+` `val4 ` `print` `(` `'val'` `, val1, val2, val3, val4) ` ` ` `# next to calculate total variation in Y from mean value ` `# variation in y is calcualted as ` `# y_var =(y1-ymean)**2+(y2-ymean)**2...+(yn-ymean)2 ` ` ` `y ` `=` `[` `1` `, ` `2` `, ` `3` `, ` `6` `] ` ` ` `y_var ` `=` `(` `1` `-` `3` `)` `*` `*` `2` `+` `(` `2` `-` `3` `)` `*` `*` `2` `+` `(` `3` `-` `3` `)` `*` `*` `2` `+` `(` `6` `-` `3` `)` `*` `*` `2` `SE_mean ` `=` `y_var ` ` ` `# by calculating y_var we are calculating the distance ` `# between y data points and mean value of y ` `# so answer to our question, % of the total variation ` `# of wrt x is denoted as below: ` `r_squared ` `=` `1` `-` `(SE_line ` `/` `SE_mean) ` ` ` `# [SE_line / SE_mean] -->tells us the what % of variation ` `# in y is not described by regression line ` `# 1-(SE_line / SE_mean) --> gives us the exact value of ` `# how much % y varies with variation in x ` `print` `(` `'Rsquared error'` `, r_squared) ` |

*chevron_right*

*filter_none*

**Output : **

Rsquared error 0.8928571428571429

**Code : R-Squared Error with sklearn**

`from` `sklearn.metrics ` `import` `r2_score ` ` ` `# r2 error calculated by sklearn is similar ` `# to ours mathematically calculated r2 error ` `# calculate r2 error using sklearn ` `r2_score(y, y_pred) ` |

*chevron_right*

*filter_none*

**Output : **

0.8928571428571429

## Recommended Posts:

- Mathematical explanation for Linear Regression working
- Chi-Square Test for Feature Selection - Mathematical Explanation
- Q-learning Mathematical Background
- ML | OPTICS Clustering Explanation
- Recurrent Neural Networks Explanation
- Explanation of Fundamental Functions involved in A3C algorithm
- Long Short Term Memory Networks Explanation
- Python | Mean Squared Error
- NZEC error in Python
- ML | Log Loss and Mean Squared Error
- ML | Models Score and Error
- Python | Assertion Error
- Python | 404 Error handling in Flask
- Python | Prompt for Password at Runtime and Termination with Error Message
- Mathematical Functions in Python | Set 4 (Special Functions and Constants)

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.