Open In App

Bias and Variance in Machine Learning

There are various ways to evaluate a machine-learning model. We can use MSE (Mean Squared Error) for Regression; Precision, Recall, and ROC (Receiver of Characteristics) for a Classification Problem along with Absolute Error. In a similar way, Bias and Variance help us in parameter tuning and deciding better-fitted models among several built.

Bias is one type of error that occurs due to wrong assumptions about data such as assuming data is linear when in reality, data follows a complex function. On the other hand, variance gets introduced with high sensitivity to variations in training data. This also is one type of error since we want to make our model robust against noise. There are two types of error in machine learning. Reducible error and Irreducible error. Bias and Variance come under reducible error.



What is Bias?

Bias is simply defined as the inability of the model because of that there is some difference or error occurring between the model’s predicted value and the actual value. These differences between actual or expected values and the predicted values are known as error or bias error or error due to bias. Bias is a systematic error that occurs due to wrong assumptions in the machine learning process. 

Let  be the true value of a parameter, and let  be an estimator of  based on a sample of data. Then, the bias of the estimator  is given by:



where  is the expected value of the estimator . It is the measurement of the model that how well it fits the data. 

The high-bias model will not be able to capture the dataset trend. It is considered as the underfitting model which has a high error rate. It is due to a very simplified algorithm.

For example, a linear regression model may have a high bias if the data has a non-linear relationship.

Ways to reduce high bias in Machine Learning:

What is Variance?

Variance is the measure of spread in data from its mean position. In machine learning variance is the amount by which the performance of a predictive model changes when it is trained on different subsets of the training data. More specifically, variance is the variability of the model that how much it is sensitive to another subset of the training dataset. i.e. how much it can adjust on the new subset of the training dataset.

Let Y be the actual values of the target variable, and    be the predicted values of the target variable. Then the variance of a model can be measured as the expected value of the square of the difference between predicted values and the expected value of the predicted values.

where  is the expected value of the predicted values. Here expected value is averaged over all the training data.

Variance errors are either low or high-variance errors.

Ways to Reduce the reduce Variance in Machine Learning:

Mathematical Derivation for Total Error

Applying the Expectations on both sides.

Different Combinations of Bias-Variance

There can be four combinations between bias and variance.

Bias-Variance Combinations

Now we know that the ideal case will be Low Bias and Low variance, but in practice, it is not possible. So, we trade off between Bias and variance to achieve a balanced bias and variance.

A model with balanced bias and variance is said to have optimal generalization performance. This means that the model is able to capture the underlying patterns in the data without overfitting or underfitting. The model is likely to be just complex enough to capture the complexity of the data, but not too complex to overfit the training data. This can happen when the model has been carefully tuned to achieve a good balance between bias and variance, by adjusting the hyperparameters and selecting an appropriate model architecture.

Machine Learning Algorithm

Bias

Variance

Linear Regression

High Bias

Less Variance

Decision Tree

Low Bias

High Variance

Random Forest

Low Bias

High Variance

Bagging

Low Bias

High Variance

Bias Variance Tradeoff

If the algorithm is too simple (hypothesis with linear equation) then it may be on high bias and low variance condition and thus is error-prone. If algorithms fit too complex (hypothesis with high degree equation) then it may be on high variance and low bias. In the latter condition, the new entries will not perform well. Well, there is something between both of these conditions, known as a Trade-off or Bias Variance Trade-off. This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time. For the graph, the perfect tradeoff will be like this.

Bias-Variance Tradeoff

The technique by which we analyze the performance of the machine learning model is known as Bias Variance Decomposition. Now we give 1-1 example of Bias Variance Decomposition for classification and regression.

Bias Variance Decomposition for Classification and Regression

As per the formula, we have derived total error as the sum of Bias squares and variance. We try to make sure that the bias and the variance are comparable and one does not exceed the other by too much difference.

# Import the necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from mlxtend.evaluate import bias_variance_decomp
import warnings
warnings.filterwarnings('ignore')
 
# Load the dataset
X, y = load_iris(return_X_y=True)
 
# Split train and test dataset
X_train, X_test,\
    y_train, y_test = train_test_split(X, y,
                                       test_size=0.25,
                                       random_state=23,
                                       shuffle=True,
                                       stratify=y)
 
# Build the classification model
tree = DecisionTreeClassifier(random_state=123)
clf = BaggingClassifier(base_estimator=tree,
                        n_estimators=50,
                        random_state=23)
 
# Bias variance decompositions
avg_expected_loss, avg_bias, \
    avg_var = bias_variance_decomp(clf,
                                   X_train, y_train,
                                   X_test, y_test,
                                   loss='0-1_loss',
                                   random_seed=23)
# Print the value
print('Average expected loss: %.2f' % avg_expected_loss)
print('Average bias: %.2f' % avg_bias)
print('Average variance: %.2f' % avg_var)

                    

Output:

Average expected loss: 0.06
Average bias: 0.05
Average variance: 0.02

Now let’s perform the same on the regression task. And check the values of the bias and variance.

# Load the necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import tensorflow as tf
from mlxtend.evaluate import bias_variance_decomp
import warnings
warnings.filterwarnings('ignore')
 
# Laod the dataset
X, y = fetch_california_housing(return_X_y=True)
 
# Split train and test dataset
X_train, X_test,\
    y_train, y_test = train_test_split(X, y,
                                       test_size=0.25,
                                       random_state=23,
                                       shuffle=True)
 
# Build the regression model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation=tf.nn.relu),
    tf.keras.layers.Dense(1)
])
 
# Set optimizer and loss
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mean_squared_error',
              optimizer=optimizer)
 
# Train the model
model.fit(X_train, y_train, epochs=25, verbose=0)
# Evaluations
accuracy = model.evaluate(X_test, y_test)
print('Average: %.2f' % accuracy)
 
# Bias variance decompositions
avg_expected_loss, avg_bias,\
    avg_var = bias_variance_decomp(model,
                                   X_train, y_train,
                                   X_test, y_test,
                                   loss='mse',
                                   random_seed=23,
                                   epochs=5,
                                   verbose=0)
 
# Print the result
print('Average expected loss: %.2f' % avg_expected_loss)
print('Average bias: %.2f' % avg_bias)
print('Average variance: %.2f' % avg_var)

                    

Output:

162/162 [==============================] - 0s 802us/step - loss: 0.9195
Average: 0.92
Average expected loss: 2.30
Average bias: 0.72
Average variance: 1.58

Article Tags :