Skip to content
Related Articles

Related Articles

Improve Article

How to Extract random sample of rows in R DataFrame with nested condition

  • Last Updated : 24 Jun, 2021

In this article, we will learn how to extract random samples of rows in a DataFrame in R programming language with a nested condition.

Method 1: Using sample()

We will be using the sample() function to carry out this task. sample() function in R Language creates random samples based on the parameters provided in the function call. It takes either a vector or a positive integer as the object in the function parameter. 

Another function which we will be using is which(). This function will help us provide conditions according to which samples will be extracted.  which() function returns the elements (along with indices of the elements) which satisfy the condition given in the parameters.

Syntax: df[ sample(which ( conditions ) ,n), ]

Parameters:



  • df: DataFrame
  • n: number of samples to be generated
  • conditions: samples are extracted according to this condition. Ex: df$year > 5

DataFrame in Use:

  nameyearlengtheducation
1Welcome1040yes
2to51NAyes
3Geeks19NAno
4for126100no
5Geeks9995yes

Thus, to realize this approach the dataframe is first created and then passed to sample() along with the condition that will be used to extract rows from the dataframe. Given below are implementations that uses the above dataframe to illustrate the same.

Example 1:

R




df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
df[ sample(which (df$year > 5) ,2), ]

Output:

   name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
     name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes

Example 2:

R




df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]

Output:

       name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "3 samples"
     name year length education
5   Geeks   99     95       yes
1 Welcome   10     40       yes
2      to   51     NA       yes

Method 2: Using sample_n() function

sample_n() function in R Language is used to take random sample specimens from a data frame.



Syntax: sample_n(x, n)

Parameters:

  • x: Data Frame
  • n: size/number of items to select

Along with sample_n() function, we have also used filter() function. The filter() function in R Language is used to choose cases and filtering out the values based on the filtering expression.

Syntax: filter(x, expr)

Parameters:

  • x: Object to be filtered
  • expr: expression as a base for filtering

We have loaded the dplyr package as it contains both filter() and sample_n() function. In the parameters of the filter function, we have passed our sample dataframe->df and our Nested conditional as arguments. Then we have used our sample_n() function to extract the “n” number of samples from the dataframe after satisfying the conditions.

Syntax: filter(df, condition) %>% sample_n(., n)

Parameters:

  • df:  Dataframe Object
  • condition: Nested conditionals. Ex: df$name != “to”
  • n: Number of samples

Example 1:

R




library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$name != "to") %>% sample_n(., 2)

Output:

 name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
     name year length education
1 Welcome   10     40       yes
2   Geeks   99     95       yes

Example 2:

R




library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$year >20 ) %>% sample_n(., 2)

Output:

 name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
  name year length education
1  for  126    100        no
2   to   51     NA       yes



My Personal Notes arrow_drop_up
Recommended Articles
Page :