Open In App

How to filter R DataFrame by values in a column?

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :

  • Rows are considered to be a subset of the input.
  • Rows in the subset appear in the same order as the original dataframe.
  • Columns remain unmodified.
  • The number of groups may be reduced, based on conditions.
  • dataframe attributes are preserved during data filter.

Method 1 : Using dataframe indexing 

Any dataframe column in the R programming language can be referenced either through its name df$col-name or using its index position in the dataframe df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then a dataframe subset can be obtained. These conditions are applied to the row index of the dataframe so that the satisfied rows are returned. 

  • Selection based on a check of missing values or NA

Cells in dataframe can contain missing values or NA as its elements, and they can be verified using is.na() method in R language. 

Example:

R




# declaring a dataframe
data_frame = data.frame(col1 = c(NA,"b",NA,"e","e") , 
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values are not NA
data_frame_mod <- data_frame[!is.na(data_frame$col1),]
  
print ("Modified dataframe")
print (data_frame_mod)


Output

[1] “Original dataframe”

 col1 col2  col3

1 <NA>    0  TRUE

2    b    2 FALSE

3 <NA>    1 FALSE

4    e    4  TRUE

5    e    5  TRUE

[1] “Modified dataframe”

 col1 col2  col3

2    b    2 FALSE

4    e    4  TRUE

5    e    5  TRUE

  • Selection based on a single comparative condition on a column

Column values can be subjected to constraints to filter and subset the data. The values can be mapped to specific occurrences or within a range. 

Example:

R




# declaring a dataframe
data_frame = data.frame(col1 = c("b","b","e","e","e") , 
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which columns have col3
# value equivalent to true
data_frame_mod <- data_frame[data_frame$col3==TRUE,]
  
print ("Modified dataframe")
print (data_frame_mod)


Output

[1] “Original dataframe”

 col1 col2  col3

1    b    0  TRUE

2    b    2 FALSE

3    e    1 FALSE

4    e    4  TRUE

5    e    5  TRUE

[1] “Modified dataframe”

 col1 col2 col3

1    b    0 TRUE

4    e    4 TRUE

5    e    5 TRUE

  • Selection based on multiple comparative conditions on a column

Column values can be subjected to constraints to filter and subset the data. The conditions can be combined by logical & or | operators. The %in% operator is used here, in order to check values that match to any of the values within a specified vector. 

Example:

R




# declaring a dataframe
data_frame = data.frame(col1 = c("b","b","d","e","e") , 
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values of col1 
# are equivalent to b or e
data_frame_mod <- data_frame[data_frame$col1 %in% c("b","e"),]
print ("Modified dataframe")
print (data_frame_mod)


Output

[1] “Original dataframe”

 col1 col2  col3

1    b    0  TRUE

2    b    2 FALSE

3    d    1 FALSE

4    e    4  TRUE

5    e    5  TRUE

[1] “Modified dataframe”

 col1 col2  col3

1    b    0  TRUE

2    b    2 FALSE

4    e    4  TRUE

5    e    5  TRUE

Method 2 : Using dplyr library

The dplyr library can be installed and loaded into the working space which is used to perform data manipulation.

The filter() function is used to produce a subset of the dataframe, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset dataframe has to be retained in a separate variable. 

Syntax:

filter(df , cond)

Parameter : 

df – The dataframe object 

cond – The condition to filter the data upon

Example:

R




library ("dplyr")
  
# declaring a dataframe
data_frame = data.frame(col1 = c("b","b","d","e","e") , 
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values of col1 
# are equivalent to b or e
data_frame_mod <- filter(data_frame,col2>1)
  
print ("Modified dataframe")
print (data_frame_mod)


Output

[1] “Original dataframe” 

  col1 col2  col3 

1    b    0  TRUE 

2    b    2 FALSE 

3    d    1 FALSE 

4    e    4  TRUE 

5    e    5  TRUE 

[1] “Modified dataframe” 

col1 col2  col3 

1    b    2 FALSE 

2    e    4  TRUE 

3    e    5  TRUE

Also, the values can be checked using the %in% operator to match the column cell values with the elements contained in the input specified vector. 

Example:

R




library ("dplyr")
  
# declaring a dataframe
data_frame = data.frame(col1 = c("b","b","d","e","e") , 
                        col2 = c(0,2,1,4,5), 
                        col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking which values of col1 
# are equivalent to b or e
data_frame_mod <- filter(data_frame,col1 %in% c("b","e"))
print ("Modified dataframe")
print (data_frame_mod)


Output

[1] “Original dataframe”

col1 col2  col3

1    b    0  TRUE

2    b    2 FALSE

3    d    1 FALSE

4    e    4  TRUE

5    e    5  TRUE

[1] “Modified dataframe”

col1 col2  col3

1    b    0  TRUE

2    b    2 FALSE

4    e    4  TRUE

5    e    5  TRUE



Last Updated : 30 May, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads