# True Error vs Sample Error

• Last Updated : 21 Sep, 2021

### True Error

The true error can be said as the probability that the hypothesis will misclassify a single randomly drawn sample from the population. Here the population represents all the data in the world.

Let’s consider a hypothesis h(x) and the true/target function is f(x) of population P. The probability that h will misclassify an instance drawn at random i.e. true error is:

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready. ### Sample Error

The sample error of S with respect to target function f and data sample S is the proportion of examples S misclassifies.  or, the following formula represents also represents sample error:

• • • S.E. = 1- Accuracy

Suppose Hypothesis h misclassifies the 7 out of the 33 examples in total populations. Then the sampling error should be: ### Bias & Variance

Bias: Bias is the difference between the average prediction of the hypothesis and the correct value of prediction. The hypothesis with high bias tries to oversimplify the training (not working on a complex model). It tends to have high training errors and high test errors. Variance: High variance hypotheses have high variability between their predictions. They try to over-complex the model and do not generalize the data very well. ### Confidence Interval

Generally, the true error is complex and difficult to calculate. It can be estimated with the help of a confidence interval. The confidence interval can be estimated as the function of the sampling error.

Below are the steps for the confidence interval:

• Randomly drawn n samples S (independently of each other), where n should be >30 from the population P.
• Calculate the Sample Error of sample S.

Here we assume that the sampling error is the unbiased estimator of True Error. Following is the formula for calculating true error: where zs is the value of the z-score of the s percentage of the confidence interval:

### Implementation:

In this implementation, we will be implementing the estimation of true error using a confidence interval.

## Python3

 # importsimport numpy as npimport scipy.stats as st  #define sample datanp.random.seed(0)data = np.random.randint(10, 30, 10000)  alphas = [0.90, 0.95, 0.99, 0.995]for alpha in alphas:  print(st.norm.interval(alpha=alpha, loc=np.mean(data), scale=st.sem(data)))
# confidence Interval
90%: (17.868667310403545, 19.891332689596453)
95%: (17.67492277275104, 20.08507722724896)
99%: (17.29626006422982, 20.463739935770178)
99.5%: (17.154104780989755, 20.60589521901025)

## References:

My Personal Notes arrow_drop_up