Open In App

Introduction of Repeated Holdout Method

Last Updated : 24 Oct, 2020
Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite: Introduction of Holdout Method

  • Repeated Holdout Method is an iteration of the holdout method i.e it is the repeated execution of the holdout method.
  • This method can be repeated — ‘K’ times/iterations.
  • In this method, we employ random sampling of the dataset. The dataset is partitioned randomly and not on the basis of any formula.

[Note: Random sampling refers to the selection of ‘n’ individuals from the population, chosen in such a way that every set of ‘n’ individuals has the same chance to be selected. ] 

Example – Consider a dataset, which is stratified into the training set and test set, randomly. We repeat the holdout method for ‘K’ iterations. Let us assume K=3

  • The shaded portions in the above iterations are the test sets and the unshaded portions are the training sets, which are obtained after the stratification of the dataset.
  • In the first iteration ‘ITERATION – 01’, a classifier is constructed on the basis of the data items/example that belongs to the training set. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E1’.
  • In the second iteration ‘ITERATION – 02’, the first iteration is randomly arranged. A classifier is now constructed on the basis of training set data items/examples. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E2’.
  • In the third iteration ‘ITERATION – 03’, the second iteration is randomly arranged. A classifier is now constructed on the basis of training set data items/examples. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E3’.
  • The iterations are thus repeated ‘K=3’ times.
  • To find the overall error estimate, we can use the formula –
E=1 / K \sum_{i=1}^{k} E i \quad \text { or } \quad E=E 1+E 2+E 3 / 3

Problem: Overlapping test set problem.

  • Since we partition the dataset randomly into a training set and test set, there are some data items/examples that could not be placed in the training set at all

Example – 


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads