Prerequisite: Ensemble Classifier
Bagging and Boosting are two types of Ensemble Learning. These two decrease the variance of single estimate as they combine several estimates from different models. So the result may be a model with higher stability.
- If the difficulty of the single model is over-fitting, then Bagging is the best option.
- If the problem is that the single model gets a very low performance, Boosting could generate a combined model with lower errors as it optimises the advantages and reduces pitfalls of the single model.
Similarities Between Bagging and Boosting –
- Both are ensemble methods to get N learners from 1 learner.
- Both generate several training data sets by random sampling.
- Both make the final decision by averaging the N learners (or taking the majority of them i.e Majority Voting).
- Both are good at reducing variance and provide higher stability.
Differences Between Bagging and Boosting –
|1.||Simplest way of combining predictions that
belong to the same type.
|A way of combining predictions that
belong to the different types.
|2.||Aim to decrease variance, not bias.||Aim to decrease bias, not variance.|
|3.||Each model receives equal weight.||Models are weighted according to their performance.|
|4.||Each model is built independently.||New models are influenced
by performance of previously built models.
|5.||Different training data subsets are randomly drawn with replacement from the entire training dataset.||Every new subsets contains the elements that were misclassified by previous models.|
|6.||Bagging tries to solve over-fitting problem.||Boosting tries to reduce bias.|
|7.||If the classifier is unstable (high variance), then apply bagging.||If the classifier is stable and simple (high bias) the apply boosting.|
|8.||Random forest.||Gradient boosting.|
- Classifying data using Support Vector Machines(SVMs) in Python
- Data Preprocessing for Machine learning in Python
- Analysis of test data using K-Means Clustering in Python
- ML | Introduction to Data in Machine Learning
- ML | Understanding Data Processing
- Data Cleansing | Introduction
- Basic Concept of Classification (Data Mining)
- Processing of Raw Data to Tidy Data in R
- Classifying data using Support Vector Machines(SVMs) in R
- Multidimensional data analysis in Python
- Redundancy and Correlation in Data Mining
- Box plot and Histogram exploration on Iris data
- Exploring Data Distribution | Set 1
- Exploring Data Distribution | Set 2
- Exploring Categorical Data
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.