Skip to content
Related Articles

Related Articles

Improve Article

Filter data by multiple conditions in R using Dplyr

  • Last Updated : 28 Jul, 2021
Geek Week

In this article, we will learn how can we filter dataframe by multiple conditions in R programming language using dplyr package.

The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R programming language can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.

Method 1: Using filter() directly

For this simply the conditions to check upon are passed to the filter function, this function automatically checks the dataframe and retrieves the rows which satisfy the conditions.

Syntax: filter(df , condition)

Parameter :



df:  The data frame object

condition: filtering based upon this condition

Example : R program to filter rows using filter() function

R




library(dplyr)
  
# sample data
df=data.frame(x=c(12,31,4,66,78),
              y=c(22.1,44.5,6.1,43.1,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
  
# condition
filter(df, x<50 & z==TRUE)

Output:

   x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE

Method 2: Using %>% with filter()

This approach is considered to be a cleaner approach when you are working with a large set of conditions because the dataframe is being referred to using %>% and then the condition is being applied through the filter() function.

Syntax: df  %>%  filter ( condition )

Parameter: 



df:  The data frame object

condition:  filtering based upon this condition

Example : R program to filter using %>% 

R




library(dplyr)
  
df=data.frame(x=c(12,31,4,66,78),
              y=c(22.1,44.5,6.1,43.1,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
  
df %>%
  filter(y < 45, z != FALSE)

Output:

 x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE
3 66 43.1 TRUE

Method 3: Using NA with filter()

is.na() function accepts a value and returns TRUE if it’s a NA value and returns FALSE if it’s not a NA value.

Syntax: df %>% filter(!is.na(x))

Parameters:

is.na(): reqd to check whether the value is NA or not

x: column of dataframe object.



Example: R program to filter dataframe using NA

R




library(dplyr)
  
df=data.frame(x=c(12,31,NA,NA,NA),
              y=c(22.1,44.5,6.1,10,99),
              z=c(TRUE,TRUE,FALSE,TRUE,TRUE))
  
df %>% filter(!is.na(x))

Output:

   x    y    z
1 12 22.1 TRUE
2 31 44.5 TRUE

Method 4: Using ‘%in%’ operator with filter()

The %in% operator is used to filter out only the columns which contain the data provided in the vector.

Syntax: filter( column %in% c(“data1”, “data2″….”data N” ))

Paramaters: 

column: column name of the dataframe

c(“data1”, “data2″….”data N”): A vector containing the names of data to be found and printed.

Example: R program to filter dataframe using %in%  

R




library(dplyr)
  
df=data.frame(x=c(12,31,10,2,99),
              y=c(22.1,44.5,6.1,10,99),
              z=c("Apple","Guava", "Mango", "Apple","Mango"))
  
df %>% 
 filter(z %in% c("Apple", "Mango"))

Output:

   x    y     z
1 12 22.1 Apple
2 10  6.1 Mango
3  2 10.0 Apple
4 99 99.0 Mango



My Personal Notes arrow_drop_up
Recommended Articles
Page :