Filter or subsetting rows in R using Dplyr
Last Updated :
28 Jul, 2021
In this article, we are going to filter the rows from dataframe in R programming language using Dplyr package.
Dataframe in use:
Method 1: Subset or filter a row using filter()
To filter or subset row we are going to use the filter() function.
Syntax:
filter(dataframe,condition)
Here, dataframe is the input dataframe, and condition is used to filter the data in the dataframe
Example: R program to filter the data frame
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
print ( filter (data,department== "sales" ))
|
Output:
Method 2: Filter dataframe with multiple conditions
We are going to use the filter function to filter the rows. Here we have to specify the condition in the filter function.
Syntax:
filter(dataframe,condition1condition2,.condition n)
Here, dataframe is the input dataframe and conditions is used to filter the data in the dataframe
Example: R program to filter multiple rows
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
print ( filter (data,department== "sales" & salary >27000))
|
Output:
Example: Filter rows by OR operator
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
print ( filter (data,department== "IT" | salary >27000))
|
Output:
Example: R program to filter using and, or
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
print ( filter (data,department== "sales" & salary >27000 | salary<5000))
|
Output:
Method 3: Using slice_head() function
This function is used to get top n rows from the dataframe.
Syntax:
dataframe %>% slice_head(n)
where, dataframe is the input dataframe, %>% is the operator (pipe operator) that loads the dataframe and n is the number of rows to be displayed.
Example: R program that used slice_head() to filter rows
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
data %>% slice_head (n=3)
print ( "==========================" )
data %>% slice_head (n=5)
print ( "==========================" )
data %>% slice_head (n=1)
|
Output:
Method 4: Using slice_tail() function
This function is used to get last n rows from the dataframe
Syntax:
dataframe %>% slice_tail(n)
Where, dataframe is the input dataframe, %>% is the operator (pipe operator) that loads the dataframe and n is the number of rows to be displayed from last
Example: R program to filter last rows by using slice_tail() method
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
data %>% slice_tail (n=3)
print ( "==========================" )
data %>% slice_tail (n=5)
print ( "==========================" )
data %>% slice_tail (n=1)
|
Output:
Method 5: Using top_n() function
This function is used to get top n rows.
Syntax:
data %>% top_n(n=5)
Example: R program that filter rows using top_n() function
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,78900.00,
25000.00,45000.00,90000))
print (data)
print ( "==========================" )
data %>% top_n (n=3)
print ( "==========================" )
data %>% top_n (n=5)
print ( "==========================" )
data %>% top_n (n=1)
|
Output:
Method 6: Using slice_sample() function
Here, we are going to filter rows using the slice_sample() function, this will return sample n rows randomly
Syntax:
slice_sample(n)
Example: R program to filter rows using slice_sample () function
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
data %>% slice_sample (n=3)
print ( "==========================" )
data %>% slice_sample (n=5)
print ( "==========================" )
data %>% slice_sample (n=1)
|
Output:
Method 7: Using slice_max() function
This function returns the maximum n rows of the dataframe based on a column
Syntax:
dataframe %>% slice_max(column, n )
Where dataframe is the input dataframe, the column is the dataframe column where max rows are returned based on this column and n is the number of maximum rows to be returned
Example: R program to filter using slice_max() function
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
print (data %>% slice_max (salary, n = 3))
print ( "==========================" )
print (data %>% slice_max (department, n = 5))
print ( "==========================" )
|
Output:
Method 8: Using slice_min() function
This function returns the minimum n rows of the dataframe based on a column
Syntax:
dataframe %>% slice_min(column, n )
Where dataframe is the input dataframe, the column is the dataframe column where max rows are returned based on this column and n is the number of minimum rows to be returned
Example: R program to filter using slice_min()
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
print (data %>% slice_min (salary, n = 3))
print ( "==========================" )
print (data %>% slice_min (department, n = 5))
print ( "==========================" )
|
Output:
Method 9: Using sample_frac() function
The sample_frac() function selects a random n percentage of rows from a data frame (or table). First parameter contains the data frame name, the second parameter tells what percentage of rows to select
Syntax:
(sample_frac(dataframe,n)
Where dataframe is the input dataframe and n is the fraction value
Example: R program to filter data using sample_frac() function
R
library (dplyr)
data= data.frame (id= c (7058,7059,7060,7089,7072,7078,7093,7034),
department= c ( 'IT' , 'sales' , 'finance' , 'IT' , 'finance' ,
'sales' , 'HR' , 'HR' ),
salary= c (34500.00,560890.78,67000.78,25000.00,
78900.00,25000.00,45000.00,90000))
print (data)
print ( "==========================" )
print ( sample_frac (data,0.2))
print ( "==========================" )
print ( sample_frac (data,0.4))
print ( "==========================" )
print ( sample_frac (data,0.7))
print ( "==========================" )
|
Output:
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...