Mathematics | Hypergeometric Distribution model

Last Updated : 09 Jan, 2024

Hypergeometric Distribution Model is used for estimating the number of faults initially resident in a program at the beginning of the test or debugging process based on the hypergeometric distribution. Let $C_i-1$ be the cumulative number of errors already detected so far by $t_1, t_2, ...., t_i-1$ , and let $$N_i$ be the number of newly detected errors by time $t_i$ . Assumptions:

A program initially contains m faults when the test phase starts.
A test is defined as a number of test instances which are couples of input data and output data. In other words, the collection of test operations performed in a day or a week is called a test instance. The test instances are denoted by $t_i$ for i = 1, 2, . . ., n.
Detected faults are not removed between test instances.

Therefore, from the latter assumption, the same faults can be experienced at several test instances. Let $W_i$ be the number of faults experienced by test instance $t_i$ . It should be noted that some of the $W_i$ faults may be those that are already counted in $C_i-1$ , and the remaining Wi faults account for the newly detected faults. If $n_i$ is an observed instance of $N_i$ , then we can see that $n_i \leq W_i$ . Each fault can be classified into one of two categories:

Newly discovered faults
Rediscovered faults

If we assume that the number of newly detected faults $N_i$ follows a hypergeometric distribution, then the probability of obtaining exactly $n_i$ newly detected faults among $W_i$ faults is,

$P(N_i=n_i)=\frac{\binom{m-C_{i-1}}{n_i}\binom{C_{i-1}}{W_i-n_i}}{\binom{m}{W_i}}$

where

$C_{i-1}= \Sigma_{k=1}^{i-1}n_k, \; C_0=0\; n_0=0$

and

$max\{0, W_i-C_{}i-1\}\leq n_i\leq max\{W_i, m-C_{i-1}\}$

for all i. Since $N_i$ is assumed to be hypergeometrically distributed, the expected number of newly detected faults during the interval $[t_{i-1}, t_i]$ is,

$E(N_i)=\frac{(m-C_i)W_i}{m}$