How to Mitigate Overfitting by Creating Ensembles

A typical problem in machine learning is called overfitting, which occurs when a model learns the training data too well and performs badly on fresh, untried data. Using ensembles is a useful tactic to reduce overfitting. Ensembles increase robustness and generalization by combining predictions from many models. This tutorial looks at setting up ensembles in Scikit-Learn to deal with overfitting.

What is overfitting?

When a machine learning model learns the training data too well, it becomes overfitted and captures noise and unimportant patterns that do not transfer to fresh, unobserved data. Because the model is unable to generalize outside of the training set, this may result in worse performance on fresh datasets.

Why should we Mitigate Overfitting

Overfitting is a key issue in machine learning models because it has a negative influence on the model’s capacity to generalize to new data. Overfitting mitigation is crucial for a number of reasons:

Generalization: A machine learning model’s ability to generalize successfully to new, unobserved data is its main objective. A model performs poorly on fresh data when it overfits, which happens when it learns the noise in the training set instead of the underlying pattern.
Performance: On training data, overfitted models may perform exceptionally well, but not so well on fresh data. Predictions made by the model become more accurate and dependable when overfitting is reduced, since it enhances its performance on unknown data.
Robustness: Models that are prone to overfitting are sensitive to small changes in the training data, which can lead to unstable predictions. By mitigating overfitting, the model becomes more robust to variations in the data.
Interpretability: Overfit models are more complicated, making them difficult to comprehend. Overfitting may be reduced and the model simplified, making it easier to analyze and interpret the underlying patterns in the data.

What are Ensembles?

Ensembles are machine learning technique where the predictions from various predictors, such as classifiers or regressors, are combined by aggregating the predictions of a set of models to produce outcomes that are superior to those of any individual predictor. An ensemble is a collection of forecasters whose combined forecasts enhance performance. and the term “ensemble method” refers to the general methodology used in this ensemble learning. Bringing together a number of weak learners to become strong learners is the fundamental idea behind ensemble learning.

Types of Ensembles

Bootstrap Aggregating Bagging :Using various subsets of the training data acquired by bootstrapping, several instances of the same base model are trained in the process of bagging (random sampling with replacement). Usually, a vote or average of the various model forecasts results in the final prediction.
Boosting: Boosting is the process of training weak models one after the other and assigning more weight to cases that are misclassified. The final model emphasizes accurate classification of previously misclassified cases and is a weighted sum of the weak models.
Stacking: In stacking, many different models are trained, and then a meta-model is used to combine the predictions of these models. By learning to balance the basis models’ predictions, the meta-model produces an optimal ensemble.
Dropout: During training, random neurons are removed from the model using a regularization approach called dropout, which keeps the model from becoming too dependent on certain characteristics or combinations. A stronger, more universal model is produced with the aid of dropout.
Voting: Voting generates a final judgment by aggregating predictions from several models. There are two types of voting: soft voting takes weighted average probability into account, while hard voting uses a majority vote.
Ensemble of Diverse Models: Diverse base models are advantageous to ensembles. An ensemble may be made more adaptable by experimenting with alternative methods or hyperparameters.

Stepwise Guide of How to apply different Ensemble Methods

Importing neccesary libraries

Python3

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier, VotingClassifier

from sklearn.tree import DecisionTreeClassifier

from sklearn.linear_model import LogisticRegression

from sklearn.svm import SVC

from sklearn.model_selection import train_test_split

from sklearn.datasets import load_iris

from sklearn.metrics import accuracy_score

from sklearn.ensemble import StackingClassifier

from keras.models import Sequential

from keras.layers import Dense, Dropout

from sklearn.ensemble import BaggingClassifier

from xgboost import XGBClassifier

Loading and Splitting the dataset

Python3

# Load dataset

data = load_iris()

X, y = data.data, data.target
 
# Split the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Implementing Various Ensemble Methods

Bagging with Random Forests: Uses BaggingClassifier with RandomForestClassifier as the base estimator to train an ensemble of decision trees.
Boosting algorithms: Includes AdaBoostClassifier with DecisionTreeClassifier as the base estimator, GradientBoostingClassifier, and XGBClassifier (XGBoost).
Stacking: Uses StackingClassifier to combine predictions from RandomForestClassifier, SVC, and LogisticRegression using LogisticRegression as the final estimator.
Dropout: Implements a neural network model using Sequential from Keras with dropout layers to prevent overfitting.
Voting: Combines predictions from RandomForestClassifier, SVC, and LogisticRegression using hard voting.
Ensemble of Diverse Models: Includes an ensemble of a SVC and a DecisionTreeClassifier.

Python3

# Bagging with Random Forests

bagging_model = BaggingClassifier(base_estimator=RandomForestClassifier(), n_estimators=10)
bagging_model.fit(X_train, y_train)

bagging_predictions = bagging_model.predict(X_test)
 
# Boosting algorithms

adaboost_model = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=50)
adaboost_model.fit(X_train, y_train)

adaboost_predictions = adaboost_model.predict(X_test)
 
gradient_boost_model = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=42)
gradient_boost_model.fit(X_train, y_train)

gradient_boost_predictions = gradient_boost_model.predict(X_test)
 
xgboost_model = XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
xgboost_model.fit(X_train, y_train)

xgboost_predictions = xgboost_model.predict(X_test)
 
# Stacking

base_models = [('rf', RandomForestClassifier()), ('svc', SVC()), ('lr', LogisticRegression())]

stacking_model = StackingClassifier(estimators=base_models, final_estimator=LogisticRegression())
stacking_model.fit(X_train, y_train)

stacking_predictions = stacking_model.predict(X_test)
 
# Dropout

dropout_model = Sequential([

    Dense(128, input_dim=4, activation='relu'),

    Dropout(0.5),

    Dense(64, activation='relu'),

    Dense(3, activation='softmax')
])

dropout_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

dropout_model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)

_, dropout_accuracy = dropout_model.evaluate(X_test, y_test)
 
# Voting

voting_model = VotingClassifier(estimators=base_models, voting='hard')
voting_model.fit(X_train, y_train)

voting_predictions = voting_model.predict(X_test)
 
# Ensemble of Diverse Models

svm_model = SVC()
svm_model.fit(X_train, y_train)

svm_predictions = svm_model.predict(X_test)
 
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)

dt_predictions = dt_model.predict(X_test)

Comparing Accuracy

Python3

# Evaluate models

print("Bagging Accuracy:", accuracy_score(y_test, bagging_predictions))

print("AdaBoost Accuracy:", accuracy_score(y_test, adaboost_predictions))

print("Gradient Boosting Accuracy:", accuracy_score(y_test, gradient_boost_predictions))

print("XGBoost Accuracy:", accuracy_score(y_test, xgboost_predictions))

print("Stacking Accuracy:", accuracy_score(y_test, stacking_predictions))

print("Dropout Accuracy:", dropout_accuracy)

print("Voting Accuracy:", accuracy_score(y_test, voting_predictions))

Output:

Bagging Accuracy: 1.0
AdaBoost Accuracy: 1.0
Gradient Boosting Accuracy: 0.9666666666666667
XGBoost Accuracy: 1.0
Stacking Accuracy: 1.0
Dropout Accuracy: 0.9666666388511658
Voting Accuracy: 1.0

When to Use Which Ensemble Method?

Depending on the nature of the issue, the properties of the data, and the computer resources available, the best ensemble approach will be chosen. Determining the optimal group strategy for a given task requires experimentation and cross-validation.

Ensemble Method	When to use?
Bagging	Works well when the basic model (like Random Forests) is complicated and prone to overfitting. In cases with large volatility, it performs well.
Boosting	Beneficial when there is space for development and the basic model is poor. Boosting can handle high-dimensional data effectively and is helpful in eliminating bias.
Stacking	Stacking works well when different models can provide original insights. When there is sufficient data to train many models, it works well.
Dropout	An effective way to stop overfitting in neural networks. Deep learning situations often employ it.
Voting	A quick and easy way to combine different models. When majority votes are trusted, hard voting is appropriate.
Ensemble of Diverse Models	Suggested for mixing models with various advantages and disadvantages. When working with intricate and diverse datasets, it is helpful.

Conclusion:

Overfitting may be reduced by assembling machine learning models into ensembles, for example, by integrating gradient boosting, random forests, and decision trees. By using the advantages of each individual model, the ensemble technique improves resilience and generalization. This tutorial offers a detailed implementation that makes use of Scikit-Learn and shows how to train individual models, assemble an ensemble, and assess the performance of the model. Using the Iris dataset as an example, the example demonstrates how the ensemble may attain high accuracy without overfitting.

Article Tags :

AI-ML-DS

Machine Learning