Linear regression is a method of predictive analysis in machine learning. It is basically used to check two things:
- If a set of predictor variables (independent) does a good job predicting the outcome variable (dependent).
- Which of the predictor variables are significant in terms of predicting the outcome variable and in what way, which is determined by the magnitude and sign of the estimates respectively.
Linear regression is used with one outcome variable and one or more than one predictor variable. Simple linear regression will work with one outcome and one predictor variable. The simple linear regression model is essentially a linear equation of the form y = c + b*x; where y is the dependent variable (outcome), x is the independent variable (predictor), b is the slope of the line; also known as regression coefficient and c is the intercept; labeled as constant.
A linear regression line is a line that best fits the graph between the predictor variable (independent) and the predicted variable (dependent).
Regression line (solid green) for income vs happiness dataset
In the above diagram, the green line is the best fit line; and it is taken as the regression line for the given dataset.
One of the most popular methods of deciding the regression line is the method of least-squares. This method essentially works to find the best-fit line for the data by minimizing the sum of the squares of the vertical deviations from each data point (the deviation of a point residing on the line is 0). As the deviations are squared, there is no cancellation between the positive and negative values of the deviation.
- Select a suitable problem statement for linear regression. We will be selecting income.data_.
- Install and load the packages for plotting/visualization. You can visualize the data points to see if the data is suitable for the linear regression.
- Read the dataset in a data frame. You can also visualize the data frame after reading (example shown in the code below).
- Create a linear regression model from the data using lm() function. Store the created model in a variable.
- Explore the model.
Scatter plot after plotting the dependent and independent variables against each other
Step 1: Install and load the required packages. Read and explore the dataset. You can also set the working directory of the notebook using setwd() function, passing the path of the directory (where the dataset is stored) as an argument.
Step 2: Separate the variables of the dataset. Visualize the dataset.
x <- dataFrame$income
y <- dataFrame$happiness
Graph of X (Income) vs Y (Happiness)
Step 3: Clear the linear regression model from the data. Train and see the model.
lm(formula = y ~ x)
As you can see, the value of intercept is 0.2043. But how to obtain this value in a variable?
Extracting the values of intercept
We can use a summary of the created model to extract the value of the intercept.
intercept_value <- model_summary$coefficients[1,1]
If you try to print the summary of the model (model_summary) variable, you will see the coefficients below. It is a 2D matrix, which stores all the said coefficients. Therefore, [1,1] will correspond to the predicted intercept (of the regression line).
This is how we extract the value of intercept from a linear regression model in R.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses
are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!