Stacking in Machine Learning

Stacking is a way to ensemble multiple classifications or regression model. There are many ways to ensemble models, the widely known models are Bagging or Boosting. Bagging allows multiple similar models with high variance are averaged to decrease variance. Boosting builds multiple incremental models to decrease the bias, while keeping variance small.

Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of stacking is to explore a space of different models for the same problem. The idea is that you can attack a learning problem with different types of models which are capable to learn some part of the problem, but not the whole space of the problem. So, you can build multiple different learners and you use them to build an intermediate prediction, one prediction for each learned model. Then you add a new model which learns from the intermediate predictions the same target.
This final model is said to be stacked on the top of the others, hence the name. Thus, you might improve your overall performance, and often you end up with a model which is better than any individual intermediate model. Notice however, that it does not give you any guarantee, as is often the case with any machine learning technique.

How stacking works?

  1. We split the training data into K-folds just like K-fold cross-validation.
  2. A base model is fitted on the K-1 parts and predictions are made for Kth part.
  3. We do for each part of the training data.
  4. The base model is then fitted on the whole train data set to calculate its performance on the test set.
  5. We repeat the last 3 steps for other base models.
  6. Predictions from the train set are used as features for the second level model.
  7. Second level model is used to make a prediction on the test set.

Blending –

Blending is a similar approach to stacking.

  • The train set is split into training and validation sets.
  • We train the base models on the training set.
  • We make predictions only on the validation set and the test set.
  • The validation predictions are used as features to build a new model.
  • This model is used to make final predictions on the test set using the prediction values as features.
My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using or mail your article to See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.

Article Tags :
Practice Tags :

Be the First to upvote.

Please write to us at to report any issue with the above content.