Open In App

Select rows from a DataFrame based on values in a vector in R

Last Updated : 09 May, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to select rows from a DataFrame based on values in a vector in R Programming Language.

Method 1: Using %in% operator

%in% operator in R, is used to identify if an element belongs to a vector or Dataframe. It is used to perform a selection of the elements satisfying the condition. It takes the value and checks for its existence in the object specified. 

Syntax:

val %in% vec

It returns a boolean TRUE or FALSE value depending on whether the element is found or not. Then the corresponding element is accessed from the DataFrame. This approach creates a subset of the DataFrame without making any changes to the existing DataFrame. Any particular column can be accessed using df$colname and then matched with vector using this comparison operator. 

Example:

R




# declare a DataFrame
data_frame <- data.frame(col1 = c(1:7),col2 = LETTERS[1:7])
  
print ("Original DataFrame")
print (data_frame)
  
# declaring the vector
vec <- c('A','a','C')
  
# getting the subset DataFrame after 
# checking values if belonging to vector
sub_df <- data_frame[data_frame$col2 %in% vec,]
  
print ("Resultant DataFrame")
print (sub_df)


Output

[1] "Original DataFrame"
 col1 col2
1    1    A
2    2    B
3    3    C
4    4    D
5    5    E
6    6    F
7    7    G
[1] "Resultant DataFrame"
 col1 col2
1    1    A
3    3    C

Method 2 : Using is.element operator

This is an instance of the comparison operator which is used to check the existence of an element in a vector or a DataFrame. is.element(x, y) is identical to x %in% y. It returns a boolean logical value to return TRUE if the value is found, else FALSE. 

Syntax:

is.element(val,vec)

rbind is applied here to combine two subsets of DataFrames where in, in the first case col2 values can be checked for existence in the vector and then col3 values in vector. Both the sub-DataFrames can then be combined.

Example:

R




# declare a DataFrame
data_frame <- data.frame(
  col1 = c(1:7),col2 = LETTERS[1:7],col3 = letters[1:7])
  
print ("Original DataFrame")
print (data_frame)
  
# declaring the vector
vec <- c('a','C','D')
  
# getting the subset DataFrame after checking 
# values if belonging to vector of the 
# corresponding columns
sub_df <- rbind(data_frame[is.element(data_frame$col2, vec),],
                data_frame[is.element(data_frame$col3, vec),])
  
print ("Resultant DataFrame")
print (sub_df)


Output

[1] "Original DataFrame"
 col1 col2 col3
1    1    A    a
2    2    B    b
3    3    C    c
4    4    D    d
5    5    E    e
6    6    F    f
7    7    G    g
[1] "Resultant DataFrame"
 col1 col2 col3
3    3    C    c
4    4    D    d
1    1    A    a

Method 3 : Using data.table package

The data.table package in R can be explicitly invoked into the R working space as an enhanced version of the DataFrames. The setDT() method in R is used to convert the DataFrame to data table by reference. 

Syntax: setDT(df, keep.rownames=FALSE, key=NULL, check.names=FALSE)

Parameter:

  • df – DataFrame
  • key – The column name or any vector which has to be passed to setkeyv.

Also, the function J(vec) is then applied, which returns the vec elements by mapping it to the passed column index in the key argument of the setDT() method. It is used to create a join of the table involved along with the character vector. 

The following key points are noticed while using this approach : 

  • The dataframe is converted to a data table, therefore, each result row of the table is lead by a row number identifier followed by “:”.
  • The dataframe is checked against each value of the vector, and row of the final output DataFrame is printed in accordance with that.
  • Application of this approach may lead to ambiguity between the actual available data and the obtained result.

Example:

R




# declare a DataFrame
# different data type have been 
# indicated for different cols
library("data.table")
  
data_frame <- data.frame(
  col1 = c(6:9), 
  col2 = c(4.5,6.7,89.0,6.2), 
  col3 = factor(letters[1:4])
)
  
print("Original DataFrame")
print (data_frame)
  
# declaring the vector 
vec <- c(4,6)
data_frame <- setDT(data_frame, key = "col1")[J(vec)]
  
print ("Modified Dataframe")
print (data_frame)


Output

[1] "Original DataFrame"  
col1 col2 col3 
1    6  4.5    a 
2    7  6.7    b 
3    8 89.0    c 
4    9  6.2    d 
[1] "Modified Dataframe"   
   col1 col2 col3 
1:    4   NA <NA> 
2:    6  4.5    a

Method 4 : Using dplyr package

The dplyr package provides a variety of modules and method to simulate data manipulations. The dplyr package is not available in base R and needs to incorporated in the working space to use it as a library. A method filter() is available in this package to produce a subset of the original DataFrame where the columns remain unmodified and the rows are filtered based on the constraints applied. The rows returning a boolean TRUE value for the conditions are available as a result of the operation. However, like other operations if the filter() method yields an NA result, it is considered to be equivalent to the FALSE boolean values and hence dropped from the resulting DataFrame. 

Syntax : filter(df, FUN)

Parameter : 

  • df – A DataFrame,
  • FUN –  The function defined using the df variables, which return a boolean value upon evaluation<data-masking>

This method is used in combination with the %in% operator to select rows satisfying the indicated conditions. 

Example:

R




# declare a DataFrame
# different data type have 
# been indicated for different 
# cols
library(dplyr)
  
data_frame <- data.frame(
  "col1" = as.character(6:9), 
  "col2" = c(4.5,6.7,89.0,6.2), 
  "col3" = factor(letters[1:4])
)
  
print("Original DataFrame")
print (data_frame)
  
# declaring the vector 
vec <- (8:11)
data_frame <- filter(data_frame, col1 %in% vec) 
  
print("Modified DataFrame")
print (data_frame)


Output

[1] "Original DataFrame"  
   col1 col2 col3 
1    6  4.5    a 
2    7  6.7    b 
3    8 89.0    c
4    9  6.2    d 
[1] "Modified Dataframe" 
  col1 col2 col3 
1    8 89.0    c
2    9  6.2    d


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads