Linear regression is a common method to model the relationship between a dependent variable and one or more independent variables. Linear models are developed using the parameters which are estimated from the data. Linear regression is useful in prediction and forecasting where a predictive model is fit to an observed data set of values to determine the response. Linear regression models are often fitted using the least-squares approach where the goal is to minimize the error.
Consider a dataset where the independent attribute is represented by x and the dependent attribute is represented by y.
It is known that the equation of a straight line is y = mx + b where m is the slope and b is the intercept.
In order to prepare a simple regression model of the given dataset, we need to calculate the slope and intercept of the line which best fits the data points.
How to calculate slope and intercept?
Mathematical formula to calculate slope and intercept are given below
Slope = Sxy/Sxx where Sxy and Sxx are sample covariance and sample variance respectively. Intercept = ymean – slope* xmean
Let us use these relations to determine the linear regression for the above dataset. For this we calculate the xmean, ymean, Sxy, Sxx as shown in the table.
As per the above formulae,
Slope = 28/10 = 2.8
Intercept = 14.6 – 2.8 * 3 = 6.2
The desired equation of the regression model is y = 2.8 x + 6.2
We shall use these values to predict the values of y for the given values of x. The performance of the model can be analyzed by calculating the root mean square error and R2 value.
Calculations are shown below.
Squared Error=10.8 which means that mean squared error =3.28
Coefficient of Determination (R2) = 1- 10.8 / 89.2 = 0.878
Low value of error and high value of R2 signify that the linear regression fits data well
Let us see the Python Implementation of linear regression for this dataset.
Code 1: Import all the necessary Libraries.
Code 2: Generate the data. Calculate xmean, ymean, Sxx, Sxy to find the value of slope and intercept of regression line.
slope b1 is 2.8 intercept b0 is 6.200000000000001
Code 3: Plot the given data points and fit the regression line.
Code 4: Analyze the performance of the model by calculating mean squared error and R2
squared error is 10.800000000000004 mean squared error is 2.160000000000001 root mean square error is 1.4696938456699071 R square is 0.8789237668161435
Code 5: Use scikit library to confirm the above steps.
Slope: [2.8] Intercept: 6.199999999999999 MSE: 2.160000000000001 Root mean squared error: 1.4696938456699071 R2 score: 0.8789237668161435
Conclusion: This article helps to understand the mathematics behind simple regression and implement the same using Python.
- ML | Linear Regression vs Logistic Regression
- Data Science - Solving Linear Equations
- Data Science | Solving Linear Equations
- Linear Regression (Python Implementation)
- ML | Multiple Linear Regression using Python
- Python | Linear Regression using sklearn
- Univariate Linear Regression in Python
- Multiple Linear Regression using R
- Linear Regression using PyTorch
- Simple Linear-Regression using R
- Linear Regression Using Tensorflow
- ML | Linear Regression
- Gradient Descent in Linear Regression
- Mathematical explanation for Linear Regression working
- ML | Boston Housing Kaggle Challenge with Linear Regression
- ML | Normal Equation in Linear Regression
- ML | Locally weighted Linear Regression
- ML | Rainfall prediction using Linear regression
- A Practical approach to Simple Linear Regression using R
- Pyspark | Linear regression using Apache MLlib
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.