Filter data by multiple conditions in R using Dplyr

Last Updated : 25 Jan, 2022

In this article, we will learn how can we filter dataframe by multiple conditions in R programming language using dplyr package.

The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R programming language can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.

Method 1: Using filter() directly

For this simply the conditions to check upon are passed to the filter function, this function automatically checks the dataframe and retrieves the rows which satisfy the conditions.

Syntax: filter(df , condition)

Parameter :

df: The data frame object

condition: filtering based upon this condition

Example : R program to filter rows using filter() function

R

library(dplyr)
 
# sample data
df=data.frame(x=c(12,31,4,66,78),
              y=c(22.1,44.5,6.1,43.1,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
 
# condition
filter(df, x<50 & z==TRUE)

Output:

   x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE

Method 2: Using %>% with filter()

This approach is considered to be a cleaner approach when you are working with a large set of conditions because the dataframe is being referred to using %>% and then the condition is being applied through the filter() function.

Syntax: df %>% filter ( condition )

Parameter:

df: The data frame object

condition: filtering based upon this condition

Example : R program to filter using %>%

R

library(dplyr)
 
df=data.frame(x=c(12,31,4,66,78),
              y=c(22.1,44.5,6.1,43.1,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
 
df %>%
  filter(y < 45, z != FALSE)

Output:

 x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE
3 66 43.1 TRUE

Method 3: Using NA with filter()

is.na() function accepts a value and returns TRUE if it’s a NA value and returns FALSE if it’s not a NA value.

Syntax: df %>% filter(!is.na(x))

Parameters:

is.na(): reqd to check whether the value is NA or not

x: column of dataframe object.

Example: R program to filter dataframe using NA

R

library(dplyr)
 
df=data.frame(x=c(12,31,NA,NA,NA),
              y=c(22.1,44.5,6.1,10,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
 
df %>% filter(!is.na(x))

Output:

   x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE

Method 4: Using ‘%in%’ operator with filter()

The %in% operator is used to filter out only the columns which contain the data provided in the vector.

Syntax: filter( column %in% c(“data1”, “data2″….”data N” ))

Parameters:

column: column name of the dataframe

c(“data1”, “data2″….”data N”): A vector containing the names of data to be found and printed.

Example: R program to filter dataframe using %in%

R

library(dplyr)
 
df=data.frame(x=c(12,31,10,2,99),
              y=c(22.1,44.5,6.1,10,99),
              z=c("Apple","Guava", "Mango", "Apple","Mango"))
 
df %>% 
 filter(z %in% c("Apple", "Mango"))

Output:

   x    y     z
1 12 22.1 Apple
2 10  6.1 Mango
3  2 10.0 Apple
4 99 99.0 Mango

Suggest improvement

Rank variable by group using Dplyr package in R

How to suppress the vertical gridlines using ggplot2 in R?

Share your thoughts in the comments

Filter data by multiple conditions in R using Dplyr

Method 1: Using filter() directly

R

Method 2: Using %>% with filter()

R

Method 3: Using NA with filter()

R

Method 4: Using ‘%in%’ operator with filter()

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?