Open In App

Filter data by multiple conditions in R using Dplyr

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

In this article, we will learn how can we filter dataframe by multiple conditions in R programming language using dplyr package.

The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R programming language can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.

Method 1: Using filter() directly

For this simply the conditions to check upon are passed to the filter function, this function automatically checks the dataframe and retrieves the rows which satisfy the conditions.

Syntax: filter(df , condition)

Parameter :

df:  The data frame object

condition: filtering based upon this condition

Example : R program to filter rows using filter() function

R




library(dplyr)
 
# sample data
df=data.frame(x=c(12,31,4,66,78),
              y=c(22.1,44.5,6.1,43.1,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
 
# condition
filter(df, x<50 & z==TRUE)


Output:

   x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE

Method 2: Using %>% with filter()

This approach is considered to be a cleaner approach when you are working with a large set of conditions because the dataframe is being referred to using %>% and then the condition is being applied through the filter() function.

Syntax: df  %>%  filter ( condition )

Parameter: 

df:  The data frame object

condition:  filtering based upon this condition

Example : R program to filter using %>% 

R




library(dplyr)
 
df=data.frame(x=c(12,31,4,66,78),
              y=c(22.1,44.5,6.1,43.1,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
 
df %>%
  filter(y < 45, z != FALSE)


Output:

 x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE
3 66 43.1 TRUE

Method 3: Using NA with filter()

is.na() function accepts a value and returns TRUE if it’s a NA value and returns FALSE if it’s not a NA value.

Syntax: df %>% filter(!is.na(x))

Parameters:

is.na(): reqd to check whether the value is NA or not

x: column of dataframe object.

Example: R program to filter dataframe using NA

R




library(dplyr)
 
df=data.frame(x=c(12,31,NA,NA,NA),
              y=c(22.1,44.5,6.1,10,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
 
df %>% filter(!is.na(x))


Output:

   x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE

Method 4: Using ‘%in%’ operator with filter()

The %in% operator is used to filter out only the columns which contain the data provided in the vector.

Syntax: filter( column %in% c(“data1”, “data2″….”data N” ))

Parameters: 

column: column name of the dataframe

c(“data1”, “data2″….”data N”): A vector containing the names of data to be found and printed.

Example: R program to filter dataframe using %in%  

R




library(dplyr)
 
df=data.frame(x=c(12,31,10,2,99),
              y=c(22.1,44.5,6.1,10,99),
              z=c("Apple","Guava", "Mango", "Apple","Mango"))
 
df %>%
 filter(z %in% c("Apple", "Mango"))


Output:

   x    y     z
1 12 22.1 Apple
2 10  6.1 Mango
3  2 10.0 Apple
4 99 99.0 Mango


Last Updated : 25 Jan, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads