Skip to content
Related Articles

Related Articles

Improve Article
String Matching in R Programming
  • Last Updated : 22 Jun, 2020

String matching is an important aspect of any language. It is useful in finding, replacing as well as removing string(s). In order to understand string matching in R Language, we first have to understand what related functions are available in R. In order to do so, we can either use the matching strings or regular expressions. A regular expression is a string that contains special symbols and characters to find and extract the information needed from the given data. Regular expressions are basically strings containing characters and special symbols. To learn more about Regular Expressions.

Operations on String Matching

Finding a String

In order to search for a particular pattern in a string, we can use many functions. If we need to find the location of the required string/pattern, we can use the grep() method. On the other hand, if we just need to know whether the pattern exists or not, we can use the logical function grepl() which returns either True or False based on the result. Let us learn more about the methods.

  • grep() function: It returns the index at which the pattern is found in the vector. If there are multiple occurrences of the pattern, it returns a list of indices of the occurrences. This is very useful as it not only tells us about the occurrence of the pattern but also of its location in the vector.

    Syntax:
    grep(pattern, string, ignore.case=FALSE)

    Parameters:
    pattern: A regular expressions pattern.
    string: The character vector to be searched.
    ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

    Example 1: To find all instances of ‘he’ in the string.






    str <- c("Hello", "hello", "hi", "hey")
    grep('he', str)

    Output:

    [1] 2 4
    

    As you noticed in the above example ‘He’ was not considered because of the difference in the cases of ‘H’ and ‘h’. But if the one wants the cases to be ignored the parameter ignore.case to True which is by default set as False.

    Example 2: To find all instances of ‘he’ in the string irrespective of case




    str <- c("Hello", "hello", "hi", "hey")
    grep('he', str, ignore.case ="True")

    Output:

    [1] 1 2 4
    
  • grepl() function: It is a logical function that returns the value True if the specified pattern is found in the vector and false if it is not found.

    Syntax:
    grepl(pattern, string, ignore.case=FALSE)

    Parameters:
    pattern: A regular expressions pattern.
    string: The character vector to be searched.
    ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

    Example 1: To find whether any instance(s) of ‘the’ are present in the string.




    str <- c("Hello", "hello", "hi", "hey")
    grepl('the', str)

    Output:



    [1] FALSE
    

    Example 2: To find whether any instance(s) of ‘he’ are present in the string.




    str <- c("Hello", "hello", "hi", "hey")
    grepl('he', str)

    Output:

    [1] TRUE
    
  • regexpr() function: It searches for occurrences of a pattern in every element of the string. For example, if a vector consists of ‘n’ strings, all ‘n’ strings are searched for the pattern. If the pattern is found, the index of the pattern is returned. If not found, -1 is returned. Therefore the size of the output vector returned is equal to the size of the input.

    Syntax:
    regexpr(pattern, string, ignore.case = FALSE)

    Parameters:
    pattern: A regular expression pattern.
    string: The character vector to be searched, where each element is searched separately.
    ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

    Example 1: To find whether any instance(s) of ‘he’ is present in each string of the vector.




    str <- c("Hello", "hello", "hi", "ahey")
    regexpr('he', str)

    Output:

    [1] -1  1 -1  2
    

    Example 2: To find whether any instance(s) of words starting with a vowel is present in each string of the vector.




    str <- c("abra", "Ubra", "hunt", "quirky")
    regexpr('^[aeiouAEIOU]', str)

    Output:

    [1]  1  1 -1 -1

    Example 3:To find whether each string is of the pattern ’10+1′ of the vector.




    str <- c("1001", "11", "10012", "101")
    regexpr('10 + 1$', str)

    Output:



    [1]  1 -1 -1  1

Finding and Replacing Strings

In order to search and replace a particular string, we can use two functions namely, sub() and gsub(). sub replaces the only first occurrence of the string to be replaced and returns the modified string. gsub(), on the other hand, replaces all occurrences of the string to be replaced and returns the modified string.

Syntax:
sub(pattern, replaced_string, string)
gsub(pattern, replaced_string, string)

Parameters:
pattern: A regular expressions pattern.
string: The vector to be searched for instance(s) of the pattern to be replaced.
ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

Example 1: To replace the first occurrence of ‘he’ with ‘aa’




str = "heutabhe"
sub('he', 'aa', str)

Output:

aautabhe

Example 2: To replace all occurrences of ‘he’ with ‘aa’




str = "heutabhe"
gsub('he', 'aa', str)

Output:

[1] "aautabaa"

Finding and Removing Strings

In order to search and remove a particular string/pattern, we can use two functions namely, str_remove() and str_remove_all(). str_remove() removes the only first occurrence of the string/pattern to be removed and returns the modified string. str_remove_all() on the other hand removes all occurrences of the string to be removes and returns the modified string.

Syntax:
str_remove(string, pattern, ignore.case=False)

Parameters:
pattern: A regular expressions pattern.
string: The character vector to be searched for instance(s) of the pattern to be removed.
ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

Example 1: Removing the first occurrence of vowels in the vector




library(stringr)
x <- c("apple", "pear", "banana")
str_remove(x, "[aeiou]")

Output:

[1] "pple"  "par"   "bnana"

Example 2: Removing all occurrences of vowels in the vector




library(stringr)
x <- c("apple", "pear", "banana")
str_remove_all(x, "[aeiou]")

Output:

[1] "ppl" "pr"  "bnn"
My Personal Notes arrow_drop_up
Recommended Articles
Page :