Select rows from a DataFrame based on values in a vector in R

Last Updated : 09 May, 2021

In this article, we will discuss how to select rows from a DataFrame based on values in a vector in R Programming Language.

Method 1: Using %in% operator

%in% operator in R, is used to identify if an element belongs to a vector or Dataframe. It is used to perform a selection of the elements satisfying the condition. It takes the value and checks for its existence in the object specified.

Syntax:

val %in% vec

It returns a boolean TRUE or FALSE value depending on whether the element is found or not. Then the corresponding element is accessed from the DataFrame. This approach creates a subset of the DataFrame without making any changes to the existing DataFrame. Any particular column can be accessed using df$colname and then matched with vector using this comparison operator.

Example:

R

# declare a DataFrame 
data_frame <- data.frame(col1 = c(1:7),col2 = LETTERS[1:7]) 
  
print ("Original DataFrame") 
print (data_frame) 
  
# declaring the vector 
vec <- c('A','a','C') 
  
# getting the subset DataFrame after  
# checking values if belonging to vector 
sub_df <- data_frame[data_frame$col2 %in% vec,] 
  
print ("Resultant DataFrame") 
print (sub_df)

Output

[1] "Original DataFrame"
 col1 col2
1    1    A
2    2    B
3    3    C
4    4    D
5    5    E
6    6    F
7    7    G
[1] "Resultant DataFrame"
 col1 col2
1    1    A
3    3    C

Method 2 : Using is.element operator

This is an instance of the comparison operator which is used to check the existence of an element in a vector or a DataFrame. is.element(x, y) is identical to x %in% y. It returns a boolean logical value to return TRUE if the value is found, else FALSE.

Syntax:

is.element(val,vec)

rbind is applied here to combine two subsets of DataFrames where in, in the first case col2 values can be checked for existence in the vector and then col3 values in vector. Both the sub-DataFrames can then be combined.

Example:

R

# declare a DataFrame 
data_frame <- data.frame( 
  col1 = c(1:7),col2 = LETTERS[1:7],col3 = letters[1:7]) 
  
print ("Original DataFrame") 
print (data_frame) 
  
# declaring the vector 
vec <- c('a','C','D') 
  
# getting the subset DataFrame after checking  
# values if belonging to vector of the  
# corresponding columns 
sub_df <- rbind(data_frame[is.element(data_frame$col2, vec),], 
                data_frame[is.element(data_frame$col3, vec),]) 
  
print ("Resultant DataFrame") 
print (sub_df) 

Output

[1] "Original DataFrame"
 col1 col2 col3
1    1    A    a
2    2    B    b
3    3    C    c
4    4    D    d
5    5    E    e
6    6    F    f
7    7    G    g
[1] "Resultant DataFrame"
 col1 col2 col3
3    3    C    c
4    4    D    d
1    1    A    a

Method 3 : Using data.table package

The data.table package in R can be explicitly invoked into the R working space as an enhanced version of the DataFrames. The setDT() method in R is used to convert the DataFrame to data table by reference.

Syntax: setDT(df, keep.rownames=FALSE, key=NULL, check.names=FALSE)

Parameter:

df – DataFrame

key – The column name or any vector which has to be passed to setkeyv.

Also, the function J(vec) is then applied, which returns the vec elements by mapping it to the passed column index in the key argument of the setDT() method. It is used to create a join of the table involved along with the character vector.

The following key points are noticed while using this approach :

The dataframe is converted to a data table, therefore, each result row of the table is lead by a row number identifier followed by “:”.
The dataframe is checked against each value of the vector, and row of the final output DataFrame is printed in accordance with that.
Application of this approach may lead to ambiguity between the actual available data and the obtained result.

Example:

R

# declare a DataFrame 
# different data type have been  
# indicated for different cols 
library("data.table") 
  
data_frame <- data.frame( 
  col1 = c(6:9),  
  col2 = c(4.5,6.7,89.0,6.2),  
  col3 = factor(letters[1:4]) 
) 
  
print("Original DataFrame") 
print (data_frame) 
  
# declaring the vector  
vec <- c(4,6) 
data_frame <- setDT(data_frame, key = "col1")[J(vec)] 
  
print ("Modified Dataframe") 
print (data_frame) 

Output

[1] "Original DataFrame"  
col1 col2 col3 
1    6  4.5    a 
2    7  6.7    b 
3    8 89.0    c 
4    9  6.2    d 
[1] "Modified Dataframe"   
   col1 col2 col3 
1:    4   NA <NA> 
2:    6  4.5    a

Method 4 : Using dplyr package

The dplyr package provides a variety of modules and method to simulate data manipulations. The dplyr package is not available in base R and needs to incorporated in the working space to use it as a library. A method filter() is available in this package to produce a subset of the original DataFrame where the columns remain unmodified and the rows are filtered based on the constraints applied. The rows returning a boolean TRUE value for the conditions are available as a result of the operation. However, like other operations if the filter() method yields an NA result, it is considered to be equivalent to the FALSE boolean values and hence dropped from the resulting DataFrame.

Syntax : filter(df, FUN)

Parameter :

df – A DataFrame,

FUN – The function defined using the df variables, which return a boolean value upon evaluation<data-masking>

This method is used in combination with the %in% operator to select rows satisfying the indicated conditions.

Example:

R

# declare a DataFrame 
# different data type have  
# been indicated for different  
# cols 
library(dplyr) 
  
data_frame <- data.frame( 
  "col1" = as.character(6:9),  
  "col2" = c(4.5,6.7,89.0,6.2),  
  "col3" = factor(letters[1:4]) 
) 
  
print("Original DataFrame") 
print (data_frame) 
  
# declaring the vector  
vec <- (8:11) 
data_frame <- filter(data_frame, col1 %in% vec)  
  
print("Modified DataFrame") 
print (data_frame) 

Output

[1] "Original DataFrame"  
   col1 col2 col3 
1    6  4.5    a 
2    7  6.7    b 
3    8 89.0    c
4    9  6.2    d 
[1] "Modified Dataframe" 
  col1 col2 col3 
1    8 89.0    c
2    9  6.2    d

Suggest improvement

Select DataFrame Rows where Column Values are in Range in R

Share your thoughts in the comments

Select rows from a DataFrame based on values in a vector in R

Method 1: Using %in% operator

R

Method 2 : Using is.element operator

R

Method 3 : Using data.table package

R

Method 4 : Using dplyr package

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?