Open In App

R Program to Sample from a Population

R is a powerful and widely used programming language for statistical computing and data analysis. It provides a user-friendly ecosystem of R packages for various analytical tasks and is known for its flexibility and visualization capabilities. In R Programming Language It’s like a super-smart assistant for handling numbers and information what makes R really special is its knack for creating neat charts and graphs.

Sampling from a population is a fundamental technique in data science, allowing us to learn about large populations by studying smaller, more manageable samples. It’s very important in the era of big data, where it’s often impossible to collect and analyze data from every single individual or item in a population.



Population

In R language, “population” means you want all the data or individual items for a study. It means the whole group you want to learn about. For instance, if you’re studying the heights of all adult females in a city, that’s your population.

So, let’s focus on some basic key points that define the population more clearly.



In statistics, it’s often not feasible to collect data from an entire population, so you usually work with a sample, which is a subset of the population.

In R, there are various functions and packages that can help you work with populations and samples, depending on what specific analyses you’re interested in performing. Some common functions and packages for working with populations and samples in R include:

Sample

In R, a “sample” refers to a subset of data drawn from a larger dataset or population. This subset is selected randomly or according to specific criteria, and it is used for various purposes such as statistical analysis, modeling, and hypothesis testing.

sample(x, size, replace = FALSE, prob = NULL)

Simple Random Sampling

Simple Random Sampling is a fundamental technique in statistics for selecting a subset of individuals or items from a larger population.

In simple random sampling:

  1. 1. Each individual or item in the population has an equal chance of being selected.
  2. 2. The selection of each item is independently.

Perform Simple Random Sampling

Example 1: Sampling data from a Vector

Let’s say you have a vector of ages of people in a community and you want to randomly select 5 ages for a survey.




# Create a vector of ages
ages <- c(25, 30, 35, 40, 45, 50, 55, 60, 65, 70)
 
# Perform simple random sampling of 5 ages
sampled_ages <- sample(ages, size = 5, replace = FALSE)
 
# Print the sampled ages
print(sampled_ages)

Output:

[1] 35 30 60 65 70

This code will randomly select 5 ages from the vector ages.

Example 2: Sampling data from a Lists




# Create a list of favorite colors
favorite_colors <- list("Red", "Blue", "Green", "Yellow", "Purple", "Orange")
 
# Convert the list into a vector
colors_vector <- unlist(favorite_colors)
 
# Perform simple random sampling of 2 colors
sampled_colors <- sample(colors_vector, size = 2, replace = FALSE)
 
# Print the sampled colors
print(sampled_colors)

Output:

[1] "Yellow" "Blue"  

We start with a list called favorite_colors containing various color names.

Example 3: Sampling Using Data Frames (as a column)




# Create a sample data frame
employees <- data.frame(
  Name = c("John", "Jane", "Jim", "Jill", "Joe", "Janet"),
  Age = c(30, 35, 25, 40, 45, 28),
  Salary = c(50000, 60000, 45000, 70000, 80000, 55000)
)
 
# Perform simple random sampling of 2 employees
sampled_employees <- employees[sample(nrow(employees), size = 2,
                                      replace = FALSE), ]
 
# Print the sampled employees
print(sampled_employees)

Output:

   Name Age Salary
6 Janet 28 55000
5 Joe 45 80000

Article Tags :