Skip to content
Related Articles

Related Articles

Improve Article

Filtering row which contains a certain string using Dplyr in R

  • Last Updated : 28 Jul, 2021
Geek Week

In this article, we will learn how to filter rows that contain a certain string using dplyr package in R programming language.

Functions Used

Two main functions which will be used to carry out this task are:

  • filter(): dplyr package’s filter function will be used for filtering rows based on condition

Syntax: filter(df , condition)

Parameter :

  • df:  The data frame object
  • condition:  The condition to filter the data upon
  • grepl(): grepl() function will is used to return the value TRUE if the specified string pattern is found in the vector and FALSE if it is not found.

Syntax: grepl(pattern, string, ignore.case=FALSE)



Parameters:

  • pattern: regular expressions pattern
  • string: character vector to be searched
  • ignore.case: whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

Dataframe in Use:

 marks ageroles
20.121Software Eng.
30.222Software Dev
40.323Data Analyst
 50.424 Data Eng.
60.525FrontEnd Dev

 

Filtering rows that contain the given string

Here we have to pass the string to be searched in the grepl() function and the column to search in, this function returns true or false according to which filter() function prints the rows.

Syntax: df %>% filter(grepl(‘Pattern’, column_name))

Parameters:

df: Dataframe object

  • grepl(): finds the pattern String
  • “Pattern”: pattern(string) to be found
  • column_name: pattern(string) will be searched in this column

Example: 

R




library(dplyr)
df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
                   
                 age = c(21:25),
  
                 roles = c('Software Eng.', 'Software Dev'
                           'Data Analyst', 'Data Eng.'
                           'FrontEnd Dev'))
  
df %>% filter(grepl('Dev', roles))

Output:



  marks age        roles
1  30.2  22 Software Dev
2  60.5  25 FrontEnd Dev

Filtering rows that do not contain the given string

Note the only difference in this code from the above approach is that here we are using a ‘! not operator, this operator inverts the output provided by the grepl() function by converting TRUE to FALSE and vice versa, this in result only prints the rows which does not contain the patterns and filter outs the rows containing the pattern.

Syntax: df %>% filter(!grepl(‘Pattern’, column_name))

Parameters:

  • df: Dataframe object
  • grepl(): finds the pattern String
  • Pattern“: pattern(string) to be found
  • column_name: pattern(string) will be searched in this column

Example: 

R




library(dplyr)
  
df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
                   
                 age = c(21:25),
  
                 roles = c('Software Eng.', 'Software Dev',
                           'Data Analyst', 'Data Eng.',
                           'FrontEnd Dev'))
  
df %>% filter(!grepl('Eng.', roles))

Output:

  marks age        roles
1  30.2  22 Software Dev
2  40.3  23 Data Analyst
3  60.5  25 FrontEnd Dev

Filtering rows containing Multiple patterns(strings)

This code is also similar to the above approaches the only difference is that while passing the multiple patterns(string) in the grepl() function, the patterns are separated with the OR(‘ | ‘) operator. This prints all the rows containing the specified pattern.

Syntax

df %>% filter(grepl(‘Patt.1 | Patt.2‘, column_name))

Example:



R




library(dplyr)
  
df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
                   
                 age = c(21:25),
  
                 roles = c('Software Eng.', 'Software Dev'
                           'Data Analyst', 'Data Eng.',
                           'FrontEnd Dev'))
  
df %>% filter(grepl('Dev|Eng.', roles))

Output:

 marks age         roles
1  20.1  21 Software Eng.
2  30.2  22  Software Dev
3  50.4  24     Data Eng.
4  60.5  25  FrontEnd Dev

Filtering rows that do not contain multiple patterns(strings)

This code is similar to the above approach, the only difference is that we are using ‘! not operator, this operator inverts the output provided by the grepl() function by converting TRUE to FALSE and vice versa, this in result only prints the rows which do not contain the specified multiple patterns and filter outs the rows containing the patterns.

Syntax

df %>% filter(!grepl(‘Patt.1 | Patt.2’, column_name))

Example

R




library(dplyr)
  
df <- data.frame( marks = c(20.1, 30.2, 40.3, 50.4, 60.5),
                   
                 age = c(21:25),
  
                 roles = c('Software Eng.', 'Software Dev'
                           'Data Analyst', 'Data Eng.',
                           'FrontEnd Dev'))
  
df %>% filter(!grepl('Data|Front', roles))

Output:

  marks age         roles
1  20.1  21 Software Eng.
2  30.2  22  Software Dev



My Personal Notes arrow_drop_up
Recommended Articles
Page :