Extract rows from R DataFrame based on factors

Last Updated : 23 May, 2021

In this article, we will discuss how to extract rows from dataframe based on factors in R Programming Language.

Method 1: Using indexing methods

The data frame column can be accessed using its name (df$col-name) or by its index (df[[ col-indx ]]) to access a particular column. The data frame columns may contain values as factors by explicit conversion using the factor() method. The specific rows can then be accessed using indexing methods.

Syntax:

df[ df$col-name == val , ]

The rows which satisfy this particular column condition value will be returned as an output.

Example:

R

# declaring a data frame 
data_frame = data.frame(col1 = factor(c("A","z","z","c","e")),  
                        col2 = factor(c(4:8))) 
  
print ("Original dataframe") 
print (data_frame) 
  
sapply(data_frame , class) 
  
# where column sum is greater than 10 
data_frame_mod <- data_frame[data_frame$col1=="z",] 
  
print ("Modified dataframe") 
print (data_frame_mod) 
  
sapply(data_frame_mod , class) 

Output

[1] "Original dataframe"
 col1 col2
1    A    4
2    z    5
3    z    6
4    c    7
5    e    8
   col1     col2
"factor" "factor"
[1] "Modified dataframe"
 col1 col2
2    z    5
3    z    6
   col1     col2
"factor" "factor"

Multiple factor level rows can also be accessed using indexing method. The factor column values can be also validated against a vector containing values using the %in% operator, which is used to check the existence of the value encountered in the input vector. It returns a boolean value TRUE in case the value is contained in the vector.

Syntax:

val %in% vec

Example:

R

# declaring a data frame 
data_frame = data.frame(col1 = factor(letters[1:5]),  
                        col2 = factor(c(4:8))) 
  
print ("Original dataframe") 
print (data_frame) 
  
sapply(data_frame , class) 
  
# where column sum is greater than 10 
data_frame_mod <- data_frame[data_frame$col2 %in% c(4 , 6),] 
  
print ("Modified dataframe") 
print (data_frame_mod) 
sapply(data_frame_mod , class) 

Output

[1] "Original dataframe"
col1 col2
1    a    4
2    b    5
3    c    6
4    d    7
5    e    8
  col1     col2
"factor" "factor"
[1] "Modified dataframe"
col1 col2
1    a    4
3    c    6
  col1     col2
"factor" "factor"

Method 2 : Using subset() method

The subset() method in R is used to return the rows satisfying the constraints mentioned. Both single and multiple factor levels can be returned using this method. The row numbers in the original data frame are retained in order. The factor column values can be validated for a mentioned condition. The output has to be stored in a variable in order to preserve the changes.

Syntax:

subset ( df , condition )

Conditions may contain logical operators == , != , > , < operators to compare the factor levels contained within the columns.

Example:

R

# declaring a data frame 
data_frame = data.frame(col1 = factor(letters[1:5]),  
                        col2 = factor(c(4:8))) 
  
print ("Original dataframe") 
print (data_frame) 
  
sapply(data_frame , class) 
  
# where column sum is greater than 10 
data_frame_mod <- subset(data_frame, col2 %in% c(4 , 6)) 
print ("Modified dataframe") 
print (data_frame_mod) 
sapply(data_frame_mod , class) 

Output

[1] "Original dataframe"
 col1 col2
1    a    4
2    b    5
3    c    6
4    d    7
5    e    8
   col1     col2
"factor" "factor"
[1] "Modified dataframe"
 col1 col2
1    a    4
3    c    6
   col1     col2
"factor" "factor"

Suggest improvement

Get Standard Deviation of a Column in R dataframe

How to compare time in R?

Share your thoughts in the comments

Extract rows from R DataFrame based on factors

Method 1: Using indexing methods

R

R

Method 2 : Using subset() method

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?