Open In App

Stratified Sampling in R

Last Updated : 19 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss what is Stratified Sampling and how we can perform Stratified Sampling in the R Programming Language.

What is Stratified Sampling?

Stratified Sampling is one of the commonly used sampling methods in which a population is split into groups and then a certain number of members from each group are randomly selected to be included in the sample.

Stratified Sampling Using Number of Rows

Let’s say, in a convocation, there are 600 people; which includes either teachers, students, workforce, or guests. Suppose we’d like to take a stratified sample of 60 students such that 15 students from each group are included in the sample.

Step 1: Package Installation

install.packages("dplyr")
library(dplyr)

Step 2: Creating data frame

df <- data.frame(group= rep(c('Teachers', 'Students', 'Workforce', 'Guests'), each=150),
gpa = rnorm(600, mean=90, sd=3))

Step 3: Obtain stratified sample

strat_sample <- df %>%
group_by(group) %>%
sample_n(size=15)

Step 4: find frequency of people from each group

table(strat_sample$group)
R
#Step 1: Package Installation
install.packages("dplyr")
library(dplyr)

#Step 2: Creating data frame
df <- data.frame(group= rep(c('Teachers', 'Students', 'Workforce', 'Guests'), each=150),
                 gpa = rnorm(600, mean=90, sd=3))
head(df)

#Step 3: Obtain stratified sample
strat_sample <- df %>%
  group_by(group) %>%
  sample_n(size=15)
table(strat_sample$group)

Output:

     group      gpa
1 Teachers 88.95551
2 Teachers 88.89639
3 Teachers 86.63262
4 Teachers 89.31554
5 Teachers 88.96061
6 Teachers 86.51635

Guests Students Teachers Workforce
15 15 15 15

Stratified Sampling Using Fraction of Rows

In the above example, we can get stratified data using fraction of rows by using function sample_frac().

R
#Step 1: Package Installation
install.packages("dplyr")
library(dplyr)

#Step 2: Creating data frame
df <- data.frame(group= rep(c('Teachers', 'Students', 'Workforce', 'Guests'), each=150),
                 gpa = rnorm(600, mean=90, sd=3))

#Step 3: Obtain stratified sample
strat_sample <- df %>%
                  group_by(group) %>%
                  sample_frac(size=.20)
#Step 4: find frequency of people from each group
table(strat_sample$group)

Output:

Teachers   Students   Workforce   Guests
30 30 30 30

Conclusion

In this article, we learn about Stratified sampling in R using Number of Rows and fraction of rows. Stratified Sampling is one of the commonly used sampling method in which a population is split into groups and then a certain number of members from each group are randomly selected to be included in the sample.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads