Skip to content
Related Articles

Related Articles

Improve Article

How to filter R dataframe by multiple conditions?

  • Difficulty Level : Easy
  • Last Updated : 23 May, 2021

In R programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :

  • Rows are considered to be a subset of the input.
  • Rows in the subset appear in the same order as the original data frame.
  • Columns remain unmodified.
  • The number of groups may be reduced, based on conditions.
  • Data frame attributes are preserved during the data filter.
  • Row numbers may not be retained in the final output

The data frame rows can be subjected to multiple conditions by combining them using logical operators, like AND (&) , OR (|). The rows returning TRUE are retained in the final output.

Method 1: Using indexing method and which() function

Any data frame column in R can be referenced either through its name df$col-name or using its index position in the data frame df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then data frame subset can be obtained. These conditions are applied to the row index of the data frame so that the satisfied rows are returned. Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition.

Syntax: which( vec, arr.ind = F)

Parameter : 



vec – The vector to be subjected to conditions

The %in% operator is used to check a value in the vector specified. 

Syntax:

val %in% vec

Example:

R




# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values of col1 are 
# equivalent to b or e or the col2 
# value is greater than 4
data_frame_mod <- data_frame[which(data_frame$col1 %in% c("b","e")
                                   | data_frame$col2 > 4),]
  
print ("Modified dataframe")
print (data_frame_mod)

Output

[1] "Original dataframe"
 col1 col2  col3
1    b    0  TRUE
2    b    2 FALSE
3    d    1 FALSE
4    e    4  TRUE
5    d    5  TRUE
[1] "Modified dataframe"
 col1 col2  col3
1    b    0  TRUE
2    b    2 FALSE
4    e    4  TRUE
5    d    5  TRUE

The conditions can be aggregated together, without the use of which method also.



Example:

R




# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") ,
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values of col1 
# are equivalent to b or e
data_frame_mod <- data_frame[data_frame$col1 %in% c("b","e")
                             & data_frame$col2 > 4,]
  
print ("Modified dataframe")
print (data_frame_mod)

Output

[1] "Original dataframe"
 col1 col2  col3
1    b    0  TRUE
2    b    2 FALSE
3    d    1 FALSE
4    e    4  TRUE
5    d    5  TRUE
[1] "Modified dataframe"
[1] col1 col2 col3
<0 rows> (or 0-length row.names)

Method 2: Using dplyr package

The dplyr library can be installed and loaded into the working space which is used to perform data manipulation. The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.

Syntax: filter(df , cond)

Parameter :

df – The data frame object

cond – The condition to filter the data upon

The difference in the application of this approach is that it doesn’t retain the original row numbers of the data frame. 

Example:



R




library ("dplyr")
  
# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","e") , 
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values of col1 are
# equivalent to b and col3 is not 
# TRUE
data_frame_mod <- filter(
  data_frame,col1 == "b" & col3!=TRUE)
  
print ("Modified dataframe")
print (data_frame_mod)

Output

[1] "Original dataframe"
col1 col2  col3
1    b    0  TRUE
2    b    2 FALSE
3    d    1 FALSE
4    e    4  TRUE
5    d    5  TRUE
[1] "Modified dataframe"
 col1 col2  col3 
1    b    2 FALSE

Method 3: Using subset method

The subset() method in base R is used to return subsets of vectors, matrices, or data frames which satisfy the applied conditions. The subset() method is concerned with the rows. The row numbers are retained while applying this method. 

Syntax: subset(df , cond)

Arguments :

df – The data frame object

cond – The condition to filter the data upon

Example:

R




# declaring a data frame
data_frame = data.frame(col1 = c("b","b","d","e","d") , 
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values of col1 are
# equivalent to b or col2 value is 
# greater than 4
data_frame_mod <- subset(data_frame, col1=="b" | col2 > 4)
print ("Modified dataframe")
print (data_frame_mod)

Output

[1] "Original dataframe"
 col1 col2  col3
1    b    0  TRUE
2    b    2 FALSE
3    d    1 FALSE
4    e    4  TRUE
5    d    5  TRUE
[1] "Modified dataframe"
 col1 col2  col3
1    b    0  TRUE
2    b    2 FALSE
5    d    5  TRUE



My Personal Notes arrow_drop_up
Recommended Articles
Page :