Ensemble Methods in Python
Ensemble means a group of elements viewed as a whole rather than individually. An Ensemble method creates multiple models and combines them to solve it. Ensemble methods help to improve the robustness/generalizability of the model. In this article, we will discuss some methods with their implementation in Python. For this, we choose a dataset from the UCI repository.
Basic ensemble methods
1. Averaging method: It is mainly used for regression problems. The method consists of building multiple models independently and returning the average of the prediction of all the models. In general, the combined output is better than an individual output because variance is reduced.
In the below example, three regression models (linear regression, xgboost, and random forest) are trained and their predictions are averaged. The final prediction output is pred_final.
2. Max voting: It is mainly used for classification problems. The method consists of building multiple models independently and getting their individual output called ‘vote’. The class with maximum votes is returned as output.
In the below example, three classification models (logistic regression, xgboost, and random forest) are combined using sklearn VotingClassifier, that model is trained and the class with maximum votes is returned as output. The final prediction output is pred_final. Please note it’s a classification, not regression, so the loss may be different from other types of ensemble methods.
Let’s have a look at a bit more advanced ensemble methods
Advanced ensemble methods
Ensemble methods are extensively used in classical machine learning. Examples of algorithms using bagging are random forest and bagging meta-estimator and examples of algorithms using boosting are GBM, XGBM, Adaboost, etc.
As a developer of a machine learning model, it is highly recommended to use ensemble methods. The ensemble methods are used extensively in almost all competitions and research papers.
1. Stacking: It is an ensemble method that combines multiple models (classification or regression) via meta-model (meta-classifier or meta-regression). The base models are trained on the complete dataset, then the meta-model is trained on features returned (as output) from base models. The base models in stacking are typically different. The meta-model helps to find the features from base models to achieve the best accuracy.
- Split the train dataset into n parts
- A base model (say linear regression) is fitted on n-1 parts and predictions are made for the nth part. This is done for each one of the n part of the train set.
- The base model is then fitted on the whole train dataset.
- This model is used to predict the test dataset.
- The Steps 2 to 4 are repeated for another base model which results in another set of predictions for the train and test dataset.
- The predictions on train data set are used as a feature to build the new model.
- This final model is used to make the predictions on test dataset
Stacking is a bit different from the basic ensembling methods because it has first-level and second-level models. Stacking features are first extracted by training the dataset with all the first-level models. A first-level model is then using the train stacking features to train the model than this model predicts the final output with test stacking features.
2. Blending: It is similar to the stacking method explained above, but rather than using the whole dataset for training the base-models, a validation dataset is kept separate to make predictions.
- Split the training dataset into train, test and validation dataset.
- Fit all the base models using train dataset.
- Make predictions on validation and test dataset.
- These predictions are used as features to build a second level model
- This model is used to make predictions on test and meta-features
3. Bagging: It is also known as a bootstrapping method. Base models are run on bags to get a fair distribution of the whole dataset. A bag is a subset of the dataset along with a replacement to make the size of the bag the same as the whole dataset. The final output is formed after combining the output of all base models.
- Create multiple datasets from the train dataset by selecting observations with replacements
- Run a base model on each of the created datasets independently
- Combine the predictions of all the base models to each the final output
Bagging normally uses only one base model (XGBoost Regressor used in the code below).
4. Boosting: Boosting is a sequential method–it aims to prevent a wrong base model from affecting the final output. Instead of combing the base models, the method focuses on building a new model that is dependent on the previous one. A new model tries to remove the errors made by its previous one. Each of these models is called weak learners. The final model (aka strong learner) is formed by getting the weighted mean of all the weak learners.
- Take a subset of the train dataset.
- Train a base model on that dataset.
- Use third model to make predictions on the whole dataset.
- Calculate errors using the predicted values and actual values.
- Initialize all data points with same weight.
- Assign higher weight to incorrectly predicted data points.
- Make another model, make predictions using the new model in such a way that errors made by the previous model are mitigated/corrected.
- Similarly, create multiple models–each successive model correcting the errors of the previous model.
- The final model (strong learner) is the weighted mean of all the previous models (weak learners).
Note: The scikit-learn provides several modules/methods for ensemble methods. Please note the accuracy of a method does not suggest one method is superior to another. The article aims to give a brief introduction to ensemble methods–not to compare between them. The programmer must use a method that suits the data.