Open In App

Extract rows from R DataFrame based on factors

Last Updated : 23 May, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to extract rows from dataframe based on factors in R Programming Language.

Method 1: Using indexing methods

The data frame column can be accessed using its name (df$col-name) or by its index (df[[ col-indx ]]) to access a particular column. The data frame columns may contain values as factors by explicit conversion using the factor() method. The specific rows can then be accessed using indexing methods. 

Syntax:

df[ df$col-name == val , ]

The rows which satisfy this particular column condition value will be returned as an output.

Example: 

R




# declaring a data frame
data_frame = data.frame(col1 = factor(c("A","z","z","c","e")), 
                        col2 = factor(c(4:8)))
  
print ("Original dataframe")
print (data_frame)
  
sapply(data_frame , class)
  
# where column sum is greater than 10
data_frame_mod <- data_frame[data_frame$col1=="z",]
  
print ("Modified dataframe")
print (data_frame_mod)
  
sapply(data_frame_mod , class)


Output

[1] "Original dataframe"
 col1 col2
1    A    4
2    z    5
3    z    6
4    c    7
5    e    8
   col1     col2
"factor" "factor"
[1] "Modified dataframe"
 col1 col2
2    z    5
3    z    6
   col1     col2
"factor" "factor" 

Multiple factor level rows can also be accessed using indexing method. The factor column values can be also validated against a vector containing values using the %in% operator, which is used to check the existence of the value encountered in the input vector. It returns a boolean value TRUE in case the value is contained in the vector. 

Syntax:

val %in% vec

Example:

R




# declaring a data frame
data_frame = data.frame(col1 = factor(letters[1:5]), 
                        col2 = factor(c(4:8)))
  
print ("Original dataframe")
print (data_frame)
  
sapply(data_frame , class)
  
# where column sum is greater than 10
data_frame_mod <- data_frame[data_frame$col2 %in% c(4 , 6),]
  
print ("Modified dataframe")
print (data_frame_mod)
sapply(data_frame_mod , class)


Output

[1] "Original dataframe"
col1 col2
1    a    4
2    b    5
3    c    6
4    d    7
5    e    8
  col1     col2
"factor" "factor"
[1] "Modified dataframe"
col1 col2
1    a    4
3    c    6
  col1     col2
"factor" "factor" 

Method 2 : Using subset() method

The subset() method in R is used to return the rows satisfying the constraints mentioned. Both single and multiple factor levels can be returned using this method. The row numbers in the original data frame are retained in order. The factor column values can be validated for a mentioned condition. The output has to be stored in a variable in order to preserve the changes. 

Syntax:

subset ( df , condition )

Conditions may contain logical operators == , != , > , < operators to compare the factor levels contained within the columns.

Example:

R




# declaring a data frame
data_frame = data.frame(col1 = factor(letters[1:5]), 
                        col2 = factor(c(4:8)))
  
print ("Original dataframe")
print (data_frame)
  
sapply(data_frame , class)
  
# where column sum is greater than 10
data_frame_mod <- subset(data_frame, col2 %in% c(4 , 6))
print ("Modified dataframe")
print (data_frame_mod)
sapply(data_frame_mod , class)


Output

[1] "Original dataframe"
 col1 col2
1    a    4
2    b    5
3    c    6
4    d    7
5    e    8
   col1     col2
"factor" "factor"
[1] "Modified dataframe"
 col1 col2
1    a    4
3    c    6
   col1     col2
"factor" "factor" 


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads