**Ensemble methods** are among the most successful of Machine Learning models. Among the many successes achieved by the ensemble methods are the winning solutions at the Netflix Prize Competition. In this article, we will be exploring how and why these ensemble methods work behind the scenes. Ensemble refers to a group of similar things that are usually considered as a whole. So ensemble methods are nothing more than a group of models that are used collectively to make a prediction. Ensemble methods find their origins in the principle of Wisdom of the Crowd.

**Wisdom of the Crowd:**

Wisdom of the crowd is the principle that explains how collective knowledge is better than knowledge of the few. In simple terms, what it means is that asking many people who individually have less knowledge is better than asking a few who have a lot of knowledge. Seems counter-intuitive, right?

Let’s resolve this paradox through mathematics. Suppose we have an expert whose accuracy at giving the right solution to a problem is 90%. Also, suppose that the accuracy of a non-expert at giving the right solution to the same problem is just 51%. Now, if we ask the collective opinion of 1000 non-experts, we get an accuracy of 75%(A total of 510 non-experts will give the correct solution on average, but the probability that they will collectively give a right solution, i.e, at least 501 people will give the correct solution will be above 75%). It is still much lower than the probability of getting the correct solution from the expert, but nevertheless it is much higher than a single non-expert(with a probability of just 51%). If however, we increase the number of non-experts we are considering from 1000 to 10000 and do the math, then the probability that they will give the correct solution is 97%! This is a much higher probability to get the correct solution than that of the expert.

A group of 10000 non-experts, with individual accuracy of just 51% will collectively give an accuracy of 97!

This phenomenon is known as **Wisdom of the Crowd**. And this phenomenon is used by ensemble methods. Instead of training a model(say, a decision tree) that gives a very high accuracy(say 90%), it is better to create 10000 models(decision trees) that have very low accuracy(just 51%). And simply take the collective prediction of these 10000 models through a hard(simple) voting or a soft(weighted) voting.

Several other techniques such as Stacking, Bagging, Pasting, etc. might be used along with Ensemble methods.