In this article, we will discuss how to extract rows from dataframe based on factors in R Programming Language.
Method 1: Using indexing methods
The data frame column can be accessed using its name (df$col-name) or by its index (df[[ col-indx ]]) to access a particular column. The data frame columns may contain values as factors by explicit conversion using the factor() method. The specific rows can then be accessed using indexing methods.
Syntax:
df[ df$col-name == val , ]
The rows which satisfy this particular column condition value will be returned as an output.
Example:
R
data_frame = data.frame (col1 = factor ( c ( "A" , "z" , "z" , "c" , "e" )),
col2 = factor ( c (4:8)))
print ( "Original dataframe" )
print (data_frame)
sapply (data_frame , class)
data_frame_mod <- data_frame[data_frame$col1== "z" ,]
print ( "Modified dataframe" )
print (data_frame_mod)
sapply (data_frame_mod , class)
|
Output
[1] "Original dataframe"
col1 col2
1 A 4
2 z 5
3 z 6
4 c 7
5 e 8
col1 col2
"factor" "factor"
[1] "Modified dataframe"
col1 col2
2 z 5
3 z 6
col1 col2
"factor" "factor"
Multiple factor level rows can also be accessed using indexing method. The factor column values can be also validated against a vector containing values using the %in% operator, which is used to check the existence of the value encountered in the input vector. It returns a boolean value TRUE in case the value is contained in the vector.
Syntax:
val %in% vec
Example:
R
data_frame = data.frame (col1 = factor ( letters [1:5]),
col2 = factor ( c (4:8)))
print ( "Original dataframe" )
print (data_frame)
sapply (data_frame , class)
data_frame_mod <- data_frame[data_frame$col2 % in % c (4 , 6),]
print ( "Modified dataframe" )
print (data_frame_mod)
sapply (data_frame_mod , class)
|
Output
[1] "Original dataframe"
col1 col2
1 a 4
2 b 5
3 c 6
4 d 7
5 e 8
col1 col2
"factor" "factor"
[1] "Modified dataframe"
col1 col2
1 a 4
3 c 6
col1 col2
"factor" "factor"
Method 2 : Using subset() method
The subset() method in R is used to return the rows satisfying the constraints mentioned. Both single and multiple factor levels can be returned using this method. The row numbers in the original data frame are retained in order. The factor column values can be validated for a mentioned condition. The output has to be stored in a variable in order to preserve the changes.
Syntax:
subset ( df , condition )
Conditions may contain logical operators == , != , > , < operators to compare the factor levels contained within the columns.
Example:
R
data_frame = data.frame (col1 = factor ( letters [1:5]),
col2 = factor ( c (4:8)))
print ( "Original dataframe" )
print (data_frame)
sapply (data_frame , class)
data_frame_mod <- subset (data_frame, col2 % in % c (4 , 6))
print ( "Modified dataframe" )
print (data_frame_mod)
sapply (data_frame_mod , class)
|
Output
[1] "Original dataframe"
col1 col2
1 a 4
2 b 5
3 c 6
4 d 7
5 e 8
col1 col2
"factor" "factor"
[1] "Modified dataframe"
col1 col2
1 a 4
3 c 6
col1 col2
"factor" "factor"
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
23 May, 2021
Like Article
Save Article