Sampling distribution Using Python

Last Updated : 26 Oct, 2022

There are different types of distributions that we study in statistics like normal/gaussian distribution, exponential distribution, binomial distribution, and many others. We will study one such distribution today which is Sampling Distribution.

Let’s say we have some data then if we sample some finite number of data points from it and then calculate some statistical measure of it and let’s do this some n number of times. Then if we draw the distribution curve of those sample statistics then the distribution obtained is known as Sampling Distribution.

Sampling distribution Using Python

There is also a special case of the sampling distribution which is known as the Central Limit Theorem which says that if we take some samples from a distribution of data(no matter how it is distributed) then if we draw a distribution curve of the mean of those samples then it will be a normal distribution.

Let’s understand it by using an example:

Let’s take numbers from 1 to 10 and use them as our primary data.

Python3

import numpy as np 
num = np.arange(10) 
num 

Output:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Now, let’s sample two points from the data and take the average of these two. Also, let’s maintain a dictionary with the sample means and the number of times they appear.

Python3

sample_freq = {} 
  
for i in range(4): 
    for j in range(4): 
          # Selecting each pair possible with  
        # repetition 
        
        mean_of_two = (num[i] + num[j]) / 2
          
        if (mean_of_two in sample_freq): 
              # Updating the value for a mean value 
            # if it already exists 
            sample_freq[mean_of_two] += 1
              
        else: 
              # Adding a new key to the dictionary 
            # if it is not their 
            sample_freq[mean_of_two] = 1
  
sample_freq 

Output:

{1.0: 1, 1.5: 2, 2.0: 3, 2.5: 4, 3.0: 3, 3.5: 2, 4.0: 1}

Now, let’s plot the sample statistics to visualize its distribution.

Python3

import matplotlib.pyplot as plt 
plt.scatter(sample_freq.keys(), sample_freq.values()) 
plt.show() 

Output:

Distribution of the sample statistic

From the above graph, we can observe that the distribution of the sample statistic is symmetric and if we will take infinite such points which are totally random then we’ll be able to observe that the distribution formed will be a normal/gaussian distribution.

There are some error measurements that are related to the sampling distributions:

Standard Error

Let’s say we have a sampling distribution that has been calculated using some sample statistics then the SE of that statistics is calculated by dividing the standard deviation of those statistics by the square root of the sample size.

$SE = \frac{\sigma}{\sqrt n}$

$\sigma \rightarrow \;\text{Standard Deviation of the sample statistic.}$

$n \rightarrow \;\text{Number of the samples taken}$

Python3

means = [] 
  
# getting all the mean values also 
# taking account of number of times they occur 
for key in sample_freq.keys(): 
    for _ in range(sample_freq[key]): 
        means.append(key) 
  
# Applying standard error formula 
se = np.std(means)/np.sqrt(len(means)) 
print(f'Standard Error of the samples is {se}.') 

Output:

Standard Error of the samples is 0.19764235376052372.

Suggest improvement

Data Normalization Machine Learning

Data Mining - Time-Series, Symbolic and Biological Sequences Data

Share your thoughts in the comments

Introduction to Data Analysis

Data Analysis Libraries

Data Visulization Libraries

Exploratory Data Analysis (EDA)

Data Preprocessing

Data Transformation

Time Series Data Analysis

Case Studies and Projects

Sampling distribution Using Python

Sampling distribution Using Python

Let’s understand it by using an example:

Python3

Python3

Python3

Standard Error

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?