In this article, we will learn how to extract random samples of rows in a DataFrame in R programming language with a nested condition.
Method 1: Using sample()
We will be using the sample() function to carry out this task. sample() function in R Language creates random samples based on the parameters provided in the function call. It takes either a vector or a positive integer as the object in the function parameter.
Another function which we will be using is which(). This function will help us provide conditions according to which samples will be extracted. which() function returns the elements (along with indices of the elements) which satisfy the condition given in the parameters.
Syntax: df[ sample(which ( conditions ) ,n), ]
Parameters:
- df: DataFrame
- n: number of samples to be generated
- conditions: samples are extracted according to this condition. Ex: df$year > 5
DataFrame in Use:
|
name |
year |
length |
education |
1 |
Welcome |
10 |
40 |
yes |
2 |
to |
51 |
NA |
yes |
3 |
Geeks |
19 |
NA |
no |
4 |
for |
126 |
100 |
no |
5 |
Geeks |
99 |
95 |
yes |
Thus, to realize this approach the dataframe is first created and then passed to sample() along with the condition that will be used to extract rows from the dataframe. Given below are implementations that uses the above dataframe to illustrate the same.
Example 1:
R
df <- data.frame ( name = c ( "Welcome" , "to" , "Geeks" ,
"for" , "Geeks" ),
year = c (10, 51, 19, 126, 99),
length = c (40, NA , NA , 100, 95),
education = c ( "yes" , "yes" , "no" ,
"no" , "yes" ) )
df
print ( "2 samples" )
df[ sample ( which (df$year > 5) ,2), ]
|
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
Example 2:
R
df <- data.frame ( name = c ( "Welcome" , "to" , "Geeks" ,
"for" , "Geeks" ),
year = c (10, 51, 19, 126, 99),
length = c (40, NA , NA , 100, 95),
education = c ( "yes" , "yes" , "no" ,
"no" , "yes" ) )
df
print ( "3 samples" )
df[ sample ( which (df$education != "no" ) ,3), ]
|
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "3 samples"
name year length education
5 Geeks 99 95 yes
1 Welcome 10 40 yes
2 to 51 NA yes
Method 2: Using sample_n() function
sample_n() function in R Language is used to take random sample specimens from a data frame.
Syntax: sample_n(x, n)
Parameters:
- x: Data Frame
- n: size/number of items to select
Along with sample_n() function, we have also used filter() function. The filter() function in R Language is used to choose cases and filtering out the values based on the filtering expression.
Syntax: filter(x, expr)
Parameters:
- x: Object to be filtered
- expr: expression as a base for filtering
We have loaded the dplyr package as it contains both filter() and sample_n() function. In the parameters of the filter function, we have passed our sample dataframe->df and our Nested conditional as arguments. Then we have used our sample_n() function to extract the “n” number of samples from the dataframe after satisfying the conditions.
Syntax: filter(df, condition) %>% sample_n(., n)
Parameters:
- df: Dataframe Object
- condition: Nested conditionals. Ex: df$name != “to”
- n: Number of samples
Example 1:
R
library (dplyr)
df <- data.frame ( name = c ( "Welcome" , "to" , "Geeks" ,
"for" , "Geeks" ),
year = c (10, 51, 19, 126, 99),
length = c (40, NA , NA , 100, 95),
education = c ( "yes" , "yes" , "no" ,
"no" , "yes" ) )
df
print ( "2 samples" )
filter (df, df$name != "to" ) %>% sample_n (., 2)
|
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 Welcome 10 40 yes
2 Geeks 99 95 yes
Example 2:
R
library (dplyr)
df <- data.frame ( name = c ( "Welcome" , "to" , "Geeks" ,
"for" , "Geeks" ),
year = c (10, 51, 19, 126, 99),
length = c (40, NA , NA , 100, 95),
education = c ( "yes" , "yes" , "no" ,
"no" , "yes" ) )
df
print ( "2 samples" )
filter (df, df$year >20 ) %>% sample_n (., 2)
|
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 for 126 100 no
2 to 51 NA yes
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
24 Jun, 2021
Like Article
Save Article