In R programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :
- Rows are considered to be a subset of the input.
- Rows in the subset appear in the same order as the original data frame.
- Columns remain unmodified.
- The number of groups may be reduced, based on conditions.
- Data frame attributes are preserved during the data filter.
- Row numbers may not be retained in the final output
The data frame rows can be subjected to multiple conditions by combining them using logical operators, like AND (&) , OR (|). The rows returning TRUE are retained in the final output.
Method 1: Using indexing method and which() function
Any data frame column in R can be referenced either through its name df$col-name or using its index position in the data frame df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then data frame subset can be obtained. These conditions are applied to the row index of the data frame so that the satisfied rows are returned. Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition.
Syntax: which( vec, arr.ind = F)
Parameter :
vec – The vector to be subjected to conditions
The %in% operator is used to check a value in the vector specified.
Syntax:
val %in% vec
Example:
R
data_frame = data.frame (col1 = c ( "b" , "b" , "d" , "e" , "d" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- data_frame[ which (data_frame$col1 % in % c ( "b" , "e" )
| data_frame$col2 > 4),]
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 d 5 TRUE
The conditions can be aggregated together, without the use of which method also.
Example:
R
data_frame = data.frame (col1 = c ( "b" , "b" , "d" , "e" , "d" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- data_frame[data_frame$col1 % in % c ( "b" , "e" )
& data_frame$col2 > 4,]
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
[1] col1 col2 col3
<0 rows> (or 0-length row.names)
Method 2: Using dplyr package
The dplyr library can be installed and loaded into the working space which is used to perform data manipulation. The filter() function is used to produce a subset of the data frame, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset data frame has to be retained in a separate variable.
Syntax: filter(df , cond)
Parameter :
df – The data frame object
cond – The condition to filter the data upon
The difference in the application of this approach is that it doesn’t retain the original row numbers of the data frame.
Example:
R
library ( "dplyr" )
data_frame = data.frame (col1 = c ( "b" , "b" , "d" , "e" , "e" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- filter (
data_frame,col1 == "b" & col3!= TRUE )
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 2 FALSE
Method 3: Using subset method
The subset() method in base R is used to return subsets of vectors, matrices, or data frames which satisfy the applied conditions. The subset() method is concerned with the rows. The row numbers are retained while applying this method.
Syntax: subset(df , cond)
Arguments :
df – The data frame object
cond – The condition to filter the data upon
Example:
R
data_frame = data.frame (col1 = c ( "b" , "b" , "d" , "e" , "d" ) ,
col2 = c (0,2,1,4,5),
col3= c ( TRUE , FALSE , FALSE , TRUE , TRUE ))
print ( "Original dataframe" )
print (data_frame)
data_frame_mod <- subset (data_frame, col1== "b" | col2 > 4)
print ( "Modified dataframe" )
print (data_frame_mod)
|
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 d 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
5 d 5 TRUE
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!