How to Create a Residual Plot in Python
A residual plot is a graph in which the residuals are displayed on the y axis and the independent variable is displayed on the x-axis. A linear regression model is appropriate for the data if the dots in a residual plot are randomly distributed across the horizontal axis. Let’s see how to create a residual plot in python.
Method 1: Using the plot_regress_exog()
plot_regress_exog():
- Compare the regression findings to one regressor.
- ‘endog vs exog,”residuals versus exog,’ ‘fitted versus exog,’ and ‘fitted plus residual versus exog’ are plotted in a 2 by 2 figure.
Syntax: statsmodels.graphics.regressionplots.plot_regress_exog(results, exog_idx, fig=None)
Parameters:
- results: result instance
- exog_idx: index or name of the regressor
- fig : a figure is created if no figure is provided
Returns: 2X2 figure
Single Linear Regression
After importing the necessary packages and reading the CSV file, we use ols() from statsmodels.formula.api to fit the data to linear regression. we create a figure and pass that figure, name of the independent variable, and regression model to plot_regress_exog() method. a 2X2 figure of residual plots is displayed. In the ols() method the string before ‘~’ is the dependent variable or the variable which we are trying to predict and after ‘~’ comes the independent variables. for linear regression, there’s one dependent variable and one independent variable.
ols(‘response_variable ~ predictor_variable’, data= data)
CSV Used: headbrain3
Python3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
data = pd.read_csv( 'headbrain3.csv' )
linear_model = ols( 'Brain_weight ~ Head_size' ,
data = data).fit()
print (linear_model.summary())
fig = plt.figure(figsize = ( 14 , 8 ))
fig = sm.graphics.plot_regress_exog(linear_model,
'Head_size' ,
fig = fig)
|
Output:
We can see that the points are plotted randomly spread or scattered. points or residuals are scattered around the ‘0’ line, there is no pattern, and points are not based on one side so there’s no problem of heteroscedasticity. with the predictor variable ‘Head_size’ there’s no heteroscedasticity.
Multiple linear regression:
In multiple linear regression, we have more than independent variables or predictor variables and one dependent variable. The code is similar to linear regression except that we have to make this change in the ols() method.
ols(‘response_variable ~ predictor_variable1+ predictor_variable2 +…. ‘, data= data)
‘+’ is used to add how many ever predictor_variables we want while creating the model.
CSV Used: homeprices
Example 1:
Python3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
data = pd.read_csv( 'homeprices.csv' )
data
multi_model = ols( 'price ~ area + bedrooms' , data = data).fit()
print (multi_model.summary())
fig = plt.figure(figsize = ( 14 , 8 ))
fig = sm.graphics.plot_regress_exog(multi_model, 'area' , fig = fig)
|
Output:
We can see that the points are plotted randomly spread or scattered. points or residuals are scattered around the ‘0’ line, there is no pattern, and points are not based on one side so there’s no problem of heteroscedasticity. With the predictor variable ‘area’ there’s no heteroscedasticity.
Example 2:
Python3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
data = pd.read_csv( 'homeprices.csv' )
data
multi_model = ols( 'price ~ area + bedrooms' , data = data).fit()
fig = plt.figure(figsize = ( 14 , 8 ))
fig = sm.graphics.plot_regress_exog(multi_model, 'bedrooms' , fig = fig)
|
Output:
we can see that the points are plotted randomly spread or scattered. points or residuals are scattered around the ‘0’ line, there is no pattern and points are not based on one side so there’s no problem of heteroscedasticity. with the predictor variable ‘bedrooms’ there’s no heteroscedasticity.
seaborn.residplot(): This function will regress y on x and then plot the residuals as a scatterplot. You can fit a lowess smoother to the residual plot as an option, which can aid in detecting whether the residuals have structure.
Syntax: seaborn.residplot(*, x=None, y=None, data=None, lowess=False, x_partial=None, y_partial=None, order=1, robust=False, dropna=True, label=None, color=None, scatter_kws=None, line_kws=None, ax=None)
Parameters:
- x : column name of the independent variable (predictor) or a vector.
- y: column name of the dependent variable(response) or a vector.
- data: optional parameter. dataframe
- lowess: by default it’s false.
Below is an example of a simple residual plot where x(independent variable) is head_size from the dataset and y(dependent variable) is the brain_weight column of the dataset.
Python3
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv( 'headbrain3.csv' )
sns.residplot(x = 'Head_size' , y = 'Brain_weight' , data = data)
plt.show()
|
Output:
We can see that the points are plotted in a randomly spread, there is no pattern and points are not based on one side so there’s no problem of heteroscedasticity.
Last Updated :
21 Feb, 2022
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...