Weighted Least Squares Regression in Python

Last Updated : 12 Apr, 2024

Weighted Least Squares (WLS) regression is a powerful extension of ordinary least squares regression, particularly useful when dealing with data that violates the assumption of constant variance.

In this guide, we will learn brief overview of Weighted Least Squares regression and demonstrate how to implement it in Python using the statsmodels library.

What is Least Squares Regression?

Least Squares Regression is a method used in statistics to find the best-fitting line or curve that summarizes the relationship between two or more variables. Imagine you’re trying to draw a best-fitting line through a scatterplot of data points. This line summarizes the relationship between two variables. LSR, a fundamental statistical method, achieves exactly that. It calculates the line that minimizes the total squared difference between the observed data points and the values predicted by the line.

What is Weighted Least Squares Regression?

Weighted Least Squares (WLS) Regression is a type of statistical analysis used to fit a regression line to a set of data points. It’s similar to the traditional Least Squares method, but it gives more importance (or “weight”) to some data points over others. WLS regression assigns weights to each observation based on the variance of the error term, allowing for more accurate modeling of heteroscedastic data. Data points with lower variability or higher reliability get assigned higher weights. When fitting the regression line, WLS gives more importance to data points with higher weights, meaning they have a stronger influence on the final result. This helps to better account for variations in the data and can lead to a more accurate regression model, especially when there are unequal levels of variability in the data.

Formula: [Tex]\hat{\beta} = (X^T W X)^{-1} X^T W y[/Tex]

Where,

[Tex]\hat{\beta}[/Tex] is the vector of estimated coefficients.
X is the matrix of independent variables (with each row representing an observation and each column a different variable).
W is a diagonal matrix of weights, where larger weights indicate observations with greater importance or reliability.
y is the vector of dependent variable observations.

Weighted Least Squares Regression Implementation in Python

In Python, the statsmodels library is commonly used for various statistical modeling tasks, including ordinary least squares (OLS) regression. For weighted least squares (WLS) regression implementation we will use statsmodels library.

Steps for Weighted Least Squares Regression Implementation in Python

Define your sample data:
- Create arrays for the independent variable(s) (X) and dependent variable (y).
- Ensure that your dependent variable (y) has more variability or heteroscedasticity to justify the use of weighted least squares regression.
Calculate weights:
- Calculate the errors by subtracting the mean of the dependent variable (y) from each observed value.
- Compute the variance of the errors.
- Calculate the weights as the inverse of the error variance. This step ensures that observations with higher variance contribute less to the estimation.
Add constant term:
- Include a constant term in the independent variable(s) matrix (X) using sm.add_constant().
Fit the model:
- Use sm.WLS() to specify the weighted least squares regression model.
- Use .fit() to estimate the parameters of the model.
Inspect results

In the below code, the implementation is demonstrated using statsmodels library.

Python3

import numpy as np
import statsmodels.api as sm

# Sample data
X = np.array([1, 2, 3, 4, 5])
y = np.array([2.6, 3.7, 4.3, 5.8, 6.2])  # Adjusted y values with more variability

# Calculate weights based on the inverse of the variance of the errors
errors = y - np.mean(y)
error_variance = np.var(errors)
weights = 1 / error_variance

# Fit weighted least squares regression model
X = sm.add_constant(X)
model = sm.WLS(y, X, weights=weights)
results = model.fit()

# Print regression results
print(results.summary())

Output:

WLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.975
Model: WLS Adj. R-squared: 0.967
Method: Least Squares F-statistic: 118.5
Date: Wed, 10 Apr 2024 Prob (F-statistic): 0.00166
Time: 12:48:35 Log-Likelihood: 0.72561
No. Observations: 5 AIC: 2.549
Df Residuals: 3 BIC: 1.768
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1.7300 0.283 6.105 0.009 0.828 2.632
x1 0.9300 0.085 10.885 0.002 0.658 1.202
==============================================================================
Omnibus: nan Durbin-Watson: 3.395
Prob(Omnibus): nan Jarque-Bera (JB): 0.537
Skew: 0.600 Prob(JB): 0.765
Kurtosis: 1.935 Cond. No. 8.37
==============================================================================

The R-squared value is 0.975, indicating that 97.5% of the variance in the dependent variable is explained by the independent variable(s). Adjusted R-squared is a modified version of R-squared that adjusts for the number of predictors in the model.The Adjusted R-squared value is 0.967.
The F-statistic tests the overall significance of the regression model. F-statistic value is 118.5 with a low p-value 0.00166 indicating that the model is statistically significant.

Ordinary Least Squares Regression Vs Weighted Least Squares Regression

Aspect	Ordinary Least Squares (OLS) Regression	Weighted Least Squares (WLS) Regression
Objective	Minimize the sum of squared differences between observed and predicted values.	Minimize the weighted sum of squared differences between observed and predicted values.
Assumption	Assumes constant variance (homoscedasticity) of errors.	Allows for varying variance (heteroscedasticity) of errors.
Weighting of Observations	Assigns equal weight to each observation.	Assigns weights to observations based on the variance of the error term associated with each observation.
Usage	Suitable for datasets with constant variance of errors.	Suitable for datasets with varying variance of errors.
Implementation	Implemented using the ordinary least squares method.	Implemented using the weighted least squares method.
Model Evaluation	Provides unbiased estimates of coefficients under homoscedasticity.	Provides more accurate estimates of coefficients under heteroscedasticity.
Example	Fit a straight line through data points.	Fit a line that adjusts for varying uncertainty in data points.

Advantages of Weighted Least Squares Regression

Handles Varying Data Uncertainty: WLS regression accommodates data where the uncertainty (variance) changes across observations, providing more accurate results compared to OLS regression.
Improved Parameter Estimates: By giving more weight to reliable data points, WLS regression offers more precise estimates of coefficients and standard errors, especially in the presence of heteroscedasticity.
Robustness: WLS regression can yield more robust estimates, making it suitable for various fields where data exhibit heteroscedasticity.

Disadvantages of Weighted Least Squares Regression

Need for Correct Weighting: Correctly specifying weights based on error variance is crucial; incorrect weights can lead to biased results.
Complexity in Weight Determination: Determining appropriate weights, especially in complex datasets, can be challenging and may require careful consideration.
Computational Overhead: Implementing WLS regression may involve additional computational complexity, especially with large datasets or complex weighting schemes.
Sensitivity to Outliers: WLS regression, like OLS, can be sensitive to outliers, which may affect estimation accuracy if not properly addressed.

Conclusion

Weighted Least Squares (WLS) regression offers a valuable enhancement to traditional regression methods by accommodating data with varying levels of uncertainty. By assigning weights based on error variance, WLS regression provides more accurate parameter estimates, making it a powerful tool across diverse fields from finance to healthcare.

Suggest improvement

Locally weighted linear Regression using Python

Share your thoughts in the comments