Open In App

How to Calculate Confidence Intervals in Python?

Last Updated : 20 Feb, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

 In this article, we will be looking at the different ways to calculate confidence intervals using various distributions in the Python programming language. Confidence interval for a mean is a range of values that is likely to contain a population mean with a certain level of confidence.

Formula:

Confidence Interval = x(+/-)t*(s/√n)
  • x: sample mean
  • t: t-value that corresponds to the confidence level
  • s: sample standard deviation
  • n: sample size

Method 1: Calculate confidence Intervals using the t Distribution

This approach is used to calculate confidence Intervals for the small dataset where the n<=30 and for this, the user needs to call the t.interval() function from the scipy.stats library to get the confidence interval for a population means of the given dataset in python.

Syntax: st.t.interval(alpha, length, loc, scale)) 

Parameters:

  • alpha: Probability that an RV will be drawn from the returned range.
  • length: Length of the data set
  • loc: location parameter
  • scale: scale parameter

Example 1:

In this example, we will be using the data set of size(n=20) and will be calculating the 90% confidence Intervals using the t Distribution using the  t.interval() function and passing the alpha parameter to 0.90 in the python.

Python




import numpy as np
import scipy.stats as st
  
# define sample data
gfg_data = [1, 1, 1, 2, 2, 2, 3, 3, 3, 3
            3, 4, 4, 5, 5, 5, 6, 7, 8, 10]
  
# create 90% confidence interval
st.t.interval(alpha=0.90, df=len(gfg_data)-1,
              loc=np.mean(gfg_data),
              scale=st.sem(gfg_data))


Output:

(2.962098014195961, 4.837901985804038)

Example 2:

In this example, we will be using the data set of size(n=20) and will be calculating the 90% confidence Intervals using the t Distribution using the  t.interval() function and passing the alpha parameter to 0.99 in the python.

Python




import numpy as np
import scipy.stats as st
  
# define sample data
gfg_data = [1, 1, 1, 2, 2, 2, 3, 3, 3,
            3, 3, 4, 4, 5, 5, 5, 6,
            7, 8, 10]
  
# create 99% confidence interval
st.t.interval(alpha=0.99,
              df=len(gfg_data)-1,
              loc=np.mean(gfg_data), 
              scale=st.sem(gfg_data))


Output:

(2.3481954013214263, 5.4518045986785735)

Interpretation from example 1 and example 2:

In the case of example 1, the calculated confident mean interval of the population with 90% is (2.96-4.83), and in example 2 when calculated the confident mean interval of the population with 99% is (2.34-5.45), it can be interpreted that the example 2 confident interval is wider than the example 1 confident interval with the 95% of the population, which means that there are 99% chances the confidence interval of [2.34, 5.45] contains the true population mean

Method 2: Calculate confidence Intervals using the Normal Distribution

This approach is used to calculate confidence Intervals for the large dataset where the n>30 and for this, the user needs to call the norm.interval() function from the scipy.stats library to get the confidence interval for a population means of the given dataset where the dataset is normally distributed in python.

Syntax: st.norm.interval(alpha, loc, scale)) 

Parameters:

  • alpha: Probability that an RV will be drawn from the returned range.
  • loc: location parameter
  • scale: scale parameter

Example 3:

In this example, we will be using the random data set of size(n=100) and will be calculating the 90% confidence Intervals using the norm Distribution using the norm.interval() function and passing the alpha parameter to 0.90 in the python.

Python




import numpy as np
import scipy.stats as st
  
# define sample data
gfg_data = np.random.randint(5, 10, 100)
  
# create 90% confidence interval
# for population mean weight
st.norm.interval(alpha=0.90,
                 loc=np.mean(gfg_data),
                 scale=st.sem(gfg_data))


Output:

(6.920661262464349, 7.3593387375356505)

Example 4:

In this example, we will be using the random data set of size(n=100) and will be calculating the 99% confidence Intervals using the norm Distribution using the norm.interval() function and passing the alpha parameter to 0.99 in the python.

Python




import numpy as np
import scipy.stats as st
  
# define sample data
gfg_data = np.random.randint(5, 10, 100)
  
# create 99% confidence interval
# for population mean weight
st.norm.interval(alpha=0.99
                 loc=np.mean(gfg_data),
                 scale=st.sem(gfg_data))


Output:

(6.689075889330163, 7.450924110669837)

Interpretation from example 3 and example 4:

In the case of example 3, the calculated confident mean interval of the population with 90% is (6.92-7.35), and in example 4 when calculated the confident mean interval of the population with 99% is (6.68-7.45), it can be interpreted that the example 4 confident interval is wider than the example 3 confident interval with the 95% of the population, which means that there are 99% chances the confidence interval of [6.68, 7.45] contains the true population means.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads