Open In App

Python – Central Limit Theorem

Statistics is an important part of Data science projects. We use statical tools whenever we want to make any inference about the population of the dataset from a sample of the dataset, gather information from the dataset, or make any assumption about the parameter of the dataset. In this article, we will talk about one of the important statical tools central limit theorem.

What is Central Limit Theorem 

The definition: 



The central limit theoram states that if we take large number of samples from any population with finite mean and variance then the distribution of the sample means will follow the normal distribution regradless of the type of the original distribution. Also the mean of these sample means will be equal to the population mean and standard error(standard deviation of the sample means) will decrease with increase in sample size.

Central limit theoram 

Suppose we are sampling from a population with a finite mean and a finite standard deviation (sigma). Then Mean and standard deviation of the sampling distribution of the sample mean can be given as: 
\qquad \qquad \mu_{\bar{X}}=\mu \qquad \sigma_{\bar{X}}=\frac{\sigma}{\sqrt{n}}   



Where    represents the sampling distribution of the sample mean of size n each,    and    are the mean and standard deviation of the population respectively. 
The distribution of the sample tends towards the normal distribution as the sample size increases.

Use of Central Limit Theorem(CLT)

We can use central limit theorem for various purposes in data science project some the key uses are listed below

Python Implementation of The Central Limit Theorem  

We will generate random numbers from -40 to 40 and and collect their mean in a list. we will itratively perform his operation for different count of numbers and we will plot their sampling distribution. 

import numpy
import matplotlib.pyplot as plt
 
# number of sample
num = [1, 10, 50, 100
# list of sample means
means = [] 
 
# Generating 1, 10, 30, 100 random numbers from -40 to 40
# taking their mean and appending it to list means.
for j in num:
    # Generating seed so that we can get same result
    # every time the loop is run...
    numpy.random.seed(1)
    x = [numpy.mean(
        numpy.random.randint(
            -40, 40, j)) for _i in range(1000)]
    means.append(x)
k = 0
 
# plotting all the means in one figure
fig, ax = plt.subplots(2, 2, figsize =(8, 8))
for i in range(0, 2):
    for j in range(0, 2):
        # Histogram for each x stored in means
        ax[i, j].hist(means[k], 10, density = True)
        ax[i, j].set_title(label = num[k])
        k = k + 1
 plt.show()

                    

Output:  

Central limit theoram for getting normal distribution 

It is evident from the graphs that as we keep on increasing the sample size from 1 to 100 the histogram tends to take the shape of a normal distribution.

Rule of Thumb For Central Limit Theoram 

Generally, the Central Limit Theoram is used when the sample size is fairly big, usually larger than or equal to 30. In some cases even if the sample size is less than 30 central limit theoram still holds but for this the population distribution should be close to normal or symmetric.


Article Tags :