How to Calculate Sampling Distributions in R

Last Updated : 26 Dec, 2022

A sampling distribution is a probability distribution of a statistic obtained from a larger number of samples drawn from a specific population. The sampling distribution of a given population is the distribution of frequencies of a range of different outcomes that could possibly occur for a statistic of a population.

In statistics, a population is an entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements. A population can thus be said to be an aggregate observation of subjects grouped together by a common feature.

A sampling distribution is a statistic that is arrived out through repeated sampling from a larger population
It describes a range of possible outcomes that of a statistic, such as the mean or mode of some variable, as it truly exists a population.
The majority of data analyzed by researchers are actually drawn from samples, and not populations.

Steps to Calculate Sampling Distributions in R:

Step 1: Here, first we have to define a number of samples(n=1000).

n<-1000

Step 2: Next we create a vector(sample_means) of length ‘n’ with Null(NA) values [ rep() function is used to replicate the values in the vector

Syntax: rep(value_to_be_replicated,number_of_times)

Step 3: Later we filled the created sample_means null vector with sample means from the considered population using the mean() function which are having a sample mean of 10(mean) and standard deviation of 10(sd) of 20 samples(n) using rnorm() which is used to generate normal distributions.

Syntax: mean(x, trim = 0)

Syntax: rnorm(n, mean, sd)

Step 4: To check the created samples we used head() which returns the first six samples of the dataframe (vector,list etc,.).

Syntax:head(data_frame,no_of_rows_be_returned) #By default second argument is set to 6 in R.

Step 5: Finally to visualize the sample_mean dataset we plotted a histogram ( for better visualization ) using hist() function in R.

Syntax:hist(v,main,xlab,ylab,col)

where.

v is a vector containing values used in histogram.

main indicates title of the chart.

col is used to set color of the bars.

xlab is used to give description of x-axis.

ylab is used to give description of y-axis.

Step 6: Finally we found the probability of generated sample means which are having mean greater than or equal to 10.

Code:

In this particular example, we find the probability that the sample mean is less than or equal to 10, given that the population means is 10, the population standard deviation is 10, and the sample size is 20 is 0.506(approx 0.50).

R

# define number of samples
n < -1000
 
# create empty vector of length n
sample_means = rep(NA, n)
 
# fill empty_vector with means
for(i in 1: n){
    sample_means[i] = mean(rnorm(20, mean=10, sd=10))
}
head(sample_means)
 
# create histogram to visualize
hist(sample_means, main="Sampling Distribution",
     xlab="Sample Means", ylab="Frequency", col="blue")
 
# To cross check find mean and sd of sample
mean(sample_means)
 
sd(sample_means)
 
# To find probability
sum(sample_means >= 10)/length(sample_means)