Software Engineering | Mills’ Error Seeding Model

Mills’error seeding model proposed an error seeding method to estimate the number of errors in a program by introducing seeded errors into the program. From the debugging data, which consist of inherent errors and induced errors, the unknown number of inherent errors could be estimated. If both inherent errors and induced errors are equally likely to be detected, then the probability of k induced errors in r removed errors follows a hypergeometric distribution which is given by

    $$ P(k;N, n_1, r)=\frac{\binom{n_1}{k}\binom{N}{r-k}}{\binom{N+n_1}{r}}, \; k=1, 2, ...., r$$

N = total number of inherent errors
n1 = total number of induced errors
r = total number of errors removed during debugging
k = total number of induced errors in r removed errors
r – k = total number of inherent errors in r removed errors

Since n1, r, and k are known, the MLE of N can be shown to be

    $$\hat{N} = [N_0]+1$$



If $N_0$ is an integer, then $N_0$ and $N_0 + I$ are both the MLEs of N.


  1. It is expensive to conduct testing of the software and at the same time, it increases the testing effort.
  2. This method was also criticized for its inability to determine the type, location, and difficulty level of the induced errors such that they would be detected equally likely as the inherent errors.

Another realistic method for estimating the residual errors in a program is based on two independent groups of programmers testing the program for errors using independent sets of test cases. Suppose that out of a total number of N initial errors, the first programmer detects n1 errors (and does not remove them at all) and the second independently detects r errors from the same program.

Assume that k common errors are found by both programmers. If all errors have an equal chance of being detected, then the fraction detected by the first programmer (k) of a randomly selected subset of errors (e.g., r) should equal the fraction that the first programmer detects (n1) of the total number of initial errors N. In other words,


so that an estimate of the total number of initial errors, N, is


The probability of exactly N initial errors with k common errors in r detected errors by the second programmer can be obtained using a hypergeometric distribution as follows:

    $$  P(k;N, n_1, r)=\frac{\binom{n_1}{k}\binom{N-m}{r-k}}{\binom{N}{r}} $$

and the MLE of N is


which is the same as the above.

Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using or mail your article to See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.

Article Tags :


Please write to us at to report any issue with the above content.