Open In App

How to Perform a Breusch-Pagan Test in Python

Last Updated : 02 Mar, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Heteroskedasticity is a statistical term and it is defined as the unequal scattering of residuals. More specifically it refers to a range of measured values the change in the spread of residuals. Heteroscedasticity possesses a challenge because ordinary least squares (OLS) regression considers the residuals thrown out from a population having homoscedasticity which means constant variance. If there is a heteroscedasticity present for a regression analysis then the outcome of the analysis cannot be trusted easily.

Breusch-Pagan test is a way to check whether heteroscedasticity exists in regression analysis. A Breusch-Pagan test follows the below hypotheses:

Hypothesis:

  • The null hypothesis (H0): Signifies that Homoscedasticity is present.
  • The alternative hypothesis: (Ha): Signifies that the Homoscedasticity is not present (i.e. heteroscedasticity exists)

Syntax to install the numNumPypy, pandas and statsmodels library:

pip3 install numpy pandas statsmodels

Performing a Breusch-Pegan Test:

Performing a Breusch-Pegan test is a step-by-step process. These have been discussed below. 

Step 1: Import libraries.

The very first step is to import the libraries that we have installed above.

Python3




# Importing libraries
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf


 

 

Step 2: Create a dataset.

 

Then we need to create a dataset.

 

Python3




# Create a dataset
dataframe = pd.DataFrame({'rating': [92, 84, 87, 82, 98,
                                     94, 75, 80, 83, 89],
                          'points': [27, 30, 15, 26, 27,
                                     20, 16, 18, 19, 20],
                          'runs': [5000, 7000, 5102, 8019,
                                   1200, 7210, 6200, 9214,
                                   4012, 3102],
                          'wickets': [110, 120, 110, 80, 90,
                                      119, 116, 100, 90, 76]})


Step 3: Fit a multiple linear regression model.

The next step is to fit a multiple linear regression model. As an example, we are considering rating as the response variable and points, runs, and wickets as the explanatory variables.

Python3




# Importing libraries
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
 
# Create a dataset
dataframe = pd.DataFrame({'rating': [92, 84, 87, 82,
                                     98, 94, 75, 80,
                                     83, 89],
                          'points': [27, 30, 15, 26,
                                     27, 20, 16, 18,
                                     19, 20],
                          'runs': [5000, 7000, 5102,
                                   8019, 1200, 7210,
                                   6200, 9214, 4012,
                                   3102],
                          'wickets': [110, 120, 110,
                                      80, 90, 119,
                                      116, 100, 90,
                                      76]})
 
# fit regression model
fit = smf.ols('rating ~ points+runs+wickets', data=dataframe).fit()
print(fit.summary())


Output:

Step 4: Conduct the Breusch-Pagan test.

The next step is to conduct the Breusch-Pagan test in order to determine whether heteroscedasticity is present.

Python3




# Importing libraries
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from statsmodels.compat import lzip
import statsmodels.stats.api as sms
 
# Creating a dataset
dataframe = pd.DataFrame({'rating': [92, 84, 87, 82,
                                     98, 94, 75, 80,
                                     83, 89],
                          'points': [27, 30, 15, 26,
                                     27, 20, 16, 18,
                                     19, 20],
                          'runs': [5000, 7000, 5102,
                                   8019, 1200, 7210,
                                   6200, 9214, 4012,
                                   3102],
                          'wickets': [110, 120, 110,
                                      80, 90, 119,
                                      116, 100, 90,
                                      76]})
 
# Fit the regression model
fit = smf.ols('rating ~ points+runs+wickets', data=dataframe).fit()
 
# Conduct the Breusch-Pagan test
names = ['Lagrange multiplier statistic', 'p-value',
         'f-value', 'f p-value']
 
# Get the test result
test_result = sms.het_breuschpagan(fit.resid, fit.model.exog)
 
lzip(names, test_result)


Output:

Output Interpretation:

Here, the Lagrange multiplier statistic for the test comes out to be equal to 4.364 and the corresponding p-value comes out to be equal to 0.224. Since the p-value is greater than 0.05 so we couldn’t reject the null hypothesis. Hence, We do not have enough proof to say that heteroscedasticity is present in the regression model.

How to fix Heteroscedasticity:

In the above example, heteroscedasticity was absent in the regression model. But for the case when heteroscedasticity actually exists then there are three ways to fix this:

  • Transform the dependent variable: We can alter the dependent variable using some technique. For example, we can take the log of the dependent variable.
  • Redefine the dependent variable:  We can redefine the dependent variable. For example, using a rate for the dependent variable than the flawed value.
  • Use weighted regression: The last way is to use weighted regression. In this type of regression, the weight is assigned to each data point on the basis of the variance of its fitted value. Using proper weights can eliminate the problem of heteroscedasticity.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads