**Prerequisites:** t-test , z-test

In simple terms, Confidence Interval is a range where we are certain that true value exists. The selection of a confidence level for an interval determines the probability that the confidence interval will contain the true parameter value. This range of values is generally used to deal with population-based data, extracting specific, valuable information with a certain amount of confidence, hence the term ‘Confidence Interval’.

**Fig 1.** Shows how a confidence interval generally looks like.

**Confidence Level: **

The confidence level describes the uncertainty associated with a sampling method.

Suppose we used the same sampling method (say sample mean) to compute a different interval estimate for each sample. Some interval estimates would include the true population parameter and some would not.

A 90% confidence level means that we would expect 90% of the interval estimates to include the population parameter. A 95% confidence level means that 95% of the intervals would include the population parameter.

For example, let’s suppose you were surveying an average height of men in a particular city. To find that, you set a 95% confidence level and find that the 95% confidence interval is (168,182). That means if you repeated this over and over, 95 percent of the time the height of a man would fall somewhere between 168 cm and 182 cm.

**Constructing a Confidence Interval: **

Constructing a confidence interval involves 4 steps.

Step 1:Identify the sample problem. Choose the statistic (like sample mean, etc) that you will use to estimate population parameter.Step 2:Select a confidence level. (Usually, it is 90%, 95% or 99%)Step 3:Find the margin of error. (Usually given) If not given, use the following formula:-Margin of error = Critical value * Standard deviationStep 4:Specify the confidence interval. The uncertainty is denoted by the confidence level. And the range of the confidence interval is defined byEq-1.

where,Sample_Statistic -->Can be any kind of statistic. (eg. sample mean)Margin_of_Error -->generally, its (± 2.5)

**Calculating a Confidence Interval **

Calculation of CI requires two statistical parameters.

**Mean (μ) —**Arithmetic mean is the average of numbers. It is defined as the sum of n numbers divided by the count of numbers till n.**(Eq-2)**

**Standard deviation (σ)**—**(Eq-3)**

**a) Using t-distribution **

We use t-distribution when the sample size *n<30*.

Consider the following example. A random sample of 10 UFC fighters was taken and their weights were measured. The mean weight was found to be 240 kg. Construct a 95% confidence interval estimate for the mean weight The sample standard deviation was 25 kg. Find a confidence interval for a sample for the true mean weight of all UFC fighters.

Step 1 -Subtract 1 from your sample size.[Eq-4]This gives the degrees of freedom(df), required inStep-3.

where,df =degree of freedomn =sample size

Using Eq-4, we get **df = 10 – 1 = 9.**

Step 2 -Subtract the confidence interval from 1, then divide by two.[Eq-5]This gives the significance level(α), required inStep-3.

α =Significance levelCL =Confidence Level

Using Eq-5, we get **α = (1 – .95) / 2 = 0.025**

Step 3 -Use the values ofαanddfin the t-distribution table and find the value oft.

(df)/(α) | 0.1 | 0.05 | 0.025 | . . |
---|---|---|---|---|

∞ |
1.282 | 1.645 | 1.960 | . . |

1 |
3.078 | 6.314 | 12.706 | . . |

2 |
1.886 | 2.920 | 4.303 | . . |

: |
: | : | : | . . |

8 |
1.397 | 1.860 | 2.306 | . . |

9 |
1.383 | 1.833 | 2.262 | . . |

Using the values of df and α in the t-distribution table, we get **t = 2.262.**

Step 4 -Use the t-value obtained in step 3 in the formula given for Confidence Interval with t-distribution.[Eq-6]

where,μ =meant =chosen t-value from the table aboveσ =the standard deviationn =number of observations

So, putting the values in Eq-6, we get

where,Lower Limit =222.117Upper Limit =257.883

Therefore, we are 95% confident that the true mean weight of the UFC Fighters is between 222.117 and 257.883.

**b) Using a z-distribution**

We use z-distribution when the sample size n>30. Z-test is more useful when the standard deviation is known.

Consider the following example. A random sample of 50 adult females was taken and their RBC count is measured. The sample mean is 4.63 and the standard deviation of RBC count is 0.54. Construct a 95% confidence interval estimate for the true mean RBC count in adult females.

Step 1 -Find the mean.[Eq-2](If not already given)Step 2 -Find the standard deviation.[Eq-3](If not already given)Step 3 -Determine the z-value for the specified confidence interval. (some common values in the table given below)

Confidence Interval | z-value |

90% |
1.645 |

95% |
1.960 |

99% |
2.576 |

Step 4 -Use the z-value obtained in step 3 in the formula given for Confidence Interval with z-distribution.[Eq-7]

where,μ =meanz =chosen z-value from the table aboveσ =the standard deviationn =number of observations

Putting the values in Eq-7, we get

where,Lower Limit =4.480Upper Limit =4.780

Therefore, we are 95% confident that the true mean RBC count of adult females is between 4.480 and 4.780.

Confidence Interval is one of the foundational concepts of statistics. It tells a statement about the data. Various sampling methods such as mean, median etc. can be used based on the data present. One can also determine what distribution to use when in order to get the best results. For any doubts/queries, comment below.