Introduction of Repeated Holdout Method

Last Updated : 24 Oct, 2020

Prerequisite: Introduction of Holdout Method

Repeated Holdout Method is an iteration of the holdout method i.e it is the repeated execution of the holdout method.
This method can be repeated — ‘K’ times/iterations.
In this method, we employ random sampling of the dataset. The dataset is partitioned randomly and not on the basis of any formula.

[Note: Random sampling refers to the selection of ‘n’ individuals from the population, chosen in such a way that every set of ‘n’ individuals has the same chance to be selected. ]

Example – Consider a dataset, which is stratified into the training set and test set, randomly. We repeat the holdout method for ‘K’ iterations. Let us assume K=3

The shaded portions in the above iterations are the test sets and the unshaded portions are the training sets, which are obtained after the stratification of the dataset.
In the first iteration ‘ITERATION – 01’, a classifier is constructed on the basis of the data items/example that belongs to the training set. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E1’.
In the second iteration ‘ITERATION – 02’, the first iteration is randomly arranged. A classifier is now constructed on the basis of training set data items/examples. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E2’.
In the third iteration ‘ITERATION – 03’, the second iteration is randomly arranged. A classifier is now constructed on the basis of training set data items/examples. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E3’.
The iterations are thus repeated ‘K=3’ times.
To find the overall error estimate, we can use the formula –

$E=1 / K \sum_{i=1}^{k} E i \quad \text { or } \quad E=E 1+E 2+E 3 / 3$

Problem: Overlapping test set problem.

Since we partition the dataset randomly into a training set and test set, there are some data items/examples that could not be placed in the training set at all

Example –