Open In App

How to Create a Residual Plot in Python

Improve
Improve
Like Article
Like
Save
Share
Report

A residual plot is a graph in which the residuals are displayed on the y axis and the independent variable is displayed on the x-axis. A linear regression model is appropriate for the data if the dots in a residual plot are randomly distributed across the horizontal axis. Let’s see how to create a residual plot in python.

Method 1: Using the plot_regress_exog()

plot_regress_exog():

  • Compare the regression findings to one regressor.
  • ‘endog vs exog,”residuals versus exog,’ ‘fitted versus exog,’ and ‘fitted plus residual versus exog’ are plotted in a 2 by 2 figure.

Syntax: statsmodels.graphics.regressionplots.plot_regress_exog(results, exog_idx, fig=None)

Parameters:

  • results: result instance
  • exog_idx: index or name of the regressor
  • fig : a figure is created if no figure is provided

Returns: 2X2 figure

Single Linear Regression

After importing the necessary packages and reading the CSV file, we use ols() from statsmodels.formula.api to fit the data to linear regression. we create a figure and pass that figure, name of the independent variable, and regression model to plot_regress_exog() method. a 2X2 figure of residual plots is displayed. In the ols() method the string before ‘~’ is the dependent variable or the variable which we are trying to predict and after ‘~’ comes the independent variables. for linear regression, there’s one dependent variable and one independent variable.

ols(‘response_variable ~ predictor_variable’, data= data)

CSV Used: headbrain3

Python3




# import packages and libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# reading the csv file
data = pd.read_csv('headbrain3.csv')
  
# fit simple linear regression model
linear_model = ols('Brain_weight ~ Head_size',
                   data=data).fit()
  
# display model summary
print(linear_model.summary())
  
# modify figure size
fig = plt.figure(figsize=(14, 8))
  
# creating regression plots
fig = sm.graphics.plot_regress_exog(linear_model,
                                    'Head_size',
                                    fig=fig)


Output:

We can see that the points are plotted randomly spread or scattered. points or residuals are scattered around the ‘0’ line, there is no pattern, and points are not based on one side so there’s no problem of heteroscedasticity.  with the predictor variable ‘Head_size’ there’s no heteroscedasticity. 

Multiple linear regression:

In multiple linear regression, we have more than independent variables or predictor variables and one dependent variable. The code is similar to linear regression except that we have to make this change in the ols() method.

ols(‘response_variable ~ predictor_variable1+ predictor_variable2 +…. ‘, data= data)

‘+’ is used to add how many ever predictor_variables we want while creating the model. 

CSV Used: homeprices

Example 1:

Python3




# import packages and libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# reading the csv file
data = pd.read_csv('homeprices.csv')
data
  
# fit multi linear regression model
multi_model = ols('price ~ area + bedrooms', data=data).fit()
  
# display model summary
print(multi_model.summary())
  
# modify figure size
fig = plt.figure(figsize=(14, 8))
  
# creating regression plots
fig = sm.graphics.plot_regress_exog(multi_model, 'area', fig=fig)


Output:

We can see that the points are plotted randomly spread or scattered. points or residuals are scattered around the ‘0’ line, there is no pattern, and points are not based on one side so there’s no problem of heteroscedasticity.  With the predictor variable ‘area’ there’s no heteroscedasticity. 

Example 2:

Python3




# import packages and libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# reading the csv file
data = pd.read_csv('homeprices.csv')
data
  
# fit multi linear regression model
multi_model = ols('price ~ area + bedrooms', data=data).fit()
  
# modify figure size
fig = plt.figure(figsize=(14, 8))
  
# creating regression plots
fig = sm.graphics.plot_regress_exog(multi_model, 'bedrooms', fig=fig)


Output:

 we can see that the points are plotted randomly spread or scattered. points or residuals are scattered around the ‘0’ line, there is no pattern and points are not based on one side so there’s no problem of heteroscedasticity.  with the predictor variable ‘bedrooms’ there’s no heteroscedasticity. 

Method 2: Using seaborn.residplot()

seaborn.residplot(): This function will regress y on x  and then plot the residuals as a scatterplot. You can fit a lowess smoother to the residual plot as an option, which can aid in detecting whether the residuals have structure.

Syntax: seaborn.residplot(*, x=None, y=None, data=None, lowess=False, x_partial=None, y_partial=None, order=1, robust=False, dropna=True, label=None, color=None, scatter_kws=None, line_kws=None, ax=None)

Parameters:

  • x : column name of the independent variable (predictor) or a vector.
  • y: column name of the dependent variable(response) or a vector.
  • data: optional parameter. dataframe
  • lowess: by default it’s false.

Below is an example of a simple residual plot where x(independent variable) is head_size from the dataset and y(dependent variable) is the brain_weight column of the dataset.

Python3




# import packages and libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
  
# reading the csv file
data = pd.read_csv('headbrain3.csv')
  
sns.residplot(x='Head_size', y='Brain_weight', data=data)
  
plt.show()


Output:  

We can see that the points are plotted in a randomly spread, there is no pattern and points are not based on one side so there’s no problem of heteroscedasticity.  



Last Updated : 21 Feb, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads