Single Estimator Versus Bagging: Bias-Variance Decomposition in Scikit Learn

Last Updated : 31 Jul, 2023

You can use the Bias-Variance decomposition to assess how well one estimator performs in comparison to the Bagging ensemble approach. We may examine the average predicted loss, average bias, and average variance for both strategies by using the bias_variance_decomp function. The bias-variance trade-off, wherein increasing model complexity decreases bias but increases variance, is represented by the single estimator. By averaging the predictions from several bootstrap samples, bagging minimizes variation and lowers total error. This decomposition helps in choosing the best strategy for a given problem by revealing the model’s bias—its propensity to fit the training data—and variance—its sensitivity to fluctuations.

Single Estimator

A machine learning model known as a single estimator is used to generate predictions on new data after being trained on one set of training data. Decision trees, logistic regression, support vector machines, and linear regression are a few examples of single estimators in Scikit-Learn. Depending on the issue at hand, these models can be applied to both classification and regression tasks.

Bagging

Bagging (short for bootstrap aggregating) is a technique used to enhance the performance of machine learning models by combining the predictions of many models that were each trained on a distinct subset of the training data. The process being:

To construct numerous equal-sized subsets, replacement is used to randomly sample the training data.
Each subset is used to train a different model, creating an ensemble of models.
Each model in the ensemble creates its own prediction when using new data.
The final prediction is created by averaging the predictions of all models in the ensemble.

Random forests and bagged decision trees are two examples of bagged models in Scikit-Learn.

Bias-Variance Decomposition

Bias:

Bias defines the deviation of model expected or predicted values from the true value of the parameter it is estimating in statistics and machine learning. It can also be stated as the average difference between the predicted and the actual target value

$Bias(\hat{y}) = E(\hat{y}) - y$

where

$\hat{\theta}$ is the estimation or prediction,
$E(\hat{\theta})$ is the expected value of the estimator,
$\theta$ is the true or population value of the parameter being estimated.

Variance:

Variance is the variability or volatility of model estimations or predictions across different training datasets. It evaluates a model’s sensitivity to certain samples or instances in the training data.

$\text{Variance} = E[(\hat{y}-E[\hat{y}])^2]$

where,

$\hat{y}$ is the predicted value by model
$E[\hat{y}]$ is the mean of predicted values or known as expected values
$(\hat{y}-E[\hat{y}])^2$ is the squared difference between the predicted and expected value.

When building a model, it is ideal to pick one with low bias and low variance.

A high variance model would suggest a model that has overfit the training data and is not likely to generalize the future predictions successfully, whereas a high bias model would mean a model is underfitting, i.e. it has not comprehended your data correctly.

Bias-variance Decomposition: The ability of machine learning models to generalize to new data can be affected by either high bias or high variance. Support in identifying these issues is a bias-variance decomposition, which splits a model’s errors into bias and variance.

Decision trees are an example of a single estimator that may have a significant bias or high variation.

Bagging can lower a model’s variance and enhance generalization performance.

Example 1:

Prerequisites: Install mlxtend library for Bias-variance Decomposition

!pip install mlxtend --upgrade

Step 1: Import the necessary packages and Load the datasets

Python3

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor
from mlxtend.evaluate import bias_variance_decomp
import numpy as np
import matplotlib.pyplot as plt
 
# Load the dataset
X, y = fetch_california_housing(return_X_y=True)
# using the train test split function
X_train, X_test,y_train, y_test = train_test_split(X,y ,
                                                   random_state=23, 
                                                   test_size=0.25, 
                                                   shuffle=True)

For Single Estimator

Step 2: Find the bias & variance using Single Estimator Decision Tree

Python3

# Building a Decision tree model on population data and obtaining predicion on test data
decision_tree = DecisionTreeRegressor(criterion='absolute_error', 
                                                 min_samples_leaf=3)
decision_tree.fit(X_train, y_train)
y_hat_pop_tree = decision_tree.predict(X_test)
 
avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(decision_tree,
                                                            X_train, y_train, 
                                                            X_test, y_test, 
                                                            num_rounds=20,
                                                            loss='mse', 
                                                            random_seed=23)
print('For single Estimator')
print('Average expected loss: %.3f' % avg_expected_loss)
print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)

Output:

For single Estimator
Average expected loss: 0.527
Average bias: 0.266
Average variance: 0.261

Step 4: Plot the Bias & Variance plot for Single Estimator

Python3

labels = ['Expected Loss', 'Bias^2', 'Variance']
values = [avg_expected_loss, avg_bias, avg_var]
 
plt.bar(labels, values)
plt.xlabel('Terms')
plt.ylabel('Value')
plt.title('Bias-Variance Decomposition for Single Estimator')
plt.show()

Output:

Bias Variance Decomposition for Single Estimator

For Bagging

Step 5: Find the bias & variance using Bagging

Python3

# Building a Bagging model on population data and obtaining predicion on test data
bagging = BaggingRegressor()
bagging.fit(X_train, y_train)
y_hat_pop_bagging = bagging.predict(X_test)
avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(bagging,
                                                            X_train, y_train, 
                                                            X_test, y_test, 
                                                            num_rounds=10,
                                                            loss='mse', 
                                                            random_seed=23)
print('For bagging model')
print('Average expected loss: %.3f' % avg_expected_loss)
print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)

Output:

For bagging model
Average expected loss: 3303.124
Average bias: 2714.084
Average variance: 589.040

Step 6: Plot the Bias & Variance plot for Bagging

Python3

labels = ['Expected Loss', 'Bias^2', 'Variance']
values = [avg_expected_loss, avg_bias, avg_var]
 
plt.bar(labels, values)
plt.xlabel('Terms')
plt.ylabel('Value')
plt.title('Bias-Variance Decomposition for Bagging')
plt.show()

Output:

Bias Variance Decomposition for Bagging

Example 2: Classifications

Steps:

Load the necessary packages
Load the iris datasets
Split train and test datasets
Compute bias and variance using a Single Estimator Decision Tree
Compute bias and variance using Bagging
Plot the bias-variance decomposition for both

Python3

# Load the necessary packages
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from mlxtend.evaluate import bias_variance_decomp
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
 
# Load the datasets
X, y = load_iris(return_X_y=True)
# using the train test split function
X_train, X_test,y_train, y_test = train_test_split(X,y,
                                                   random_state=104,
                                                   test_size=0.25, 
                                                   shuffle=True)
 
# Building a Decision tree model on data and obtaining predicion on test data
tree = DecisionTreeClassifier()
tree.fit(X_train, y_train)
y_hat_pop_tree = tree.predict(X_test)
y_error, avg_bias, avg_var = bias_variance_decomp(tree, 
                                                  X_train, y_train,
                                                  X_test, y_test, 
                                                  loss='0-1_loss', 
                                                  random_seed=23)
print('Using Single Estimator')
print('Average expected loss: %.3f' % y_error)
print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)
 
# Building a Bagging model on population data and obtaining predicion on test data
bagging = BaggingClassifier()
bagging.fit(X_train, y_train)
y_hat_pop_bagging = bagging.predict(X_test)
by_error, bavg_bias, bavg_var = bias_variance_decomp(bagging, 
                                                     X_train, y_train, 
                                                     X_test, y_test, 
                                                     loss='0-1_loss',
                                                     random_seed=123)
print('Using Bagging')
print('Average expected loss: %.3f' % by_error)
print('Average bias: %.3f' % bavg_bias)
print('Average variance: %.3f' % bavg_var)
 
# Plotting the Bias-Variance decomposition graph
labels = ['Expected Loss', 'Bias^2', 'Variance']
tree_values = [y_error, avg_bias, avg_var]
bagging_values = [by_error, bavg_bias, bavg_var]
plt.figure(figsize=(12,5))
plt.subplot(1, 2, 1)
plt.bar(labels, tree_values)
plt.xlabel('Terms')
plt.ylabel('Value')
plt.title('Bias-Variance Decomposition (Decision Tree)')
 
plt.subplot(1, 2, 2)
plt.bar(labels, bagging_values)
plt.xlabel('Terms')
plt.ylabel('Value')
plt.title('Bias-Variance Decomposition (Bagging)')
 
plt.tight_layout()
plt.show()

Output:

Using Single Estimator
Average expected loss: 0.030
Average bias: 0.026
Average variance: 0.023
Using Bagging
Average expected loss: 0.035
Average bias: 0.053
Average variance: 0.020

Bias-Variance Decomposition

Suggest improvement

Feature Agglomeration vs Univariate Selection in Scikit Learn

Share your thoughts in the comments

Single Estimator Versus Bagging: Bias-Variance Decomposition in Scikit Learn