Open In App

String Matching in R Programming

String matching is an important aspect of any language. It is useful in finding, replacing as well as removing string(s). In order to understand string matching in R Language, we first have to understand what related functions are available in R. In order to do so, we can either use the matching strings or regular expressions. A regular expression is a string that contains special symbols and characters to find and extract the information needed from the given data. Regular expressions are basically strings containing characters and special symbols. To learn more about Regular Expressions.

Operations on String Matching

Finding a String

In order to search for a particular pattern in a string, we can use many functions. If we need to find the location of the required string/pattern, we can use the grep() method. On the other hand, if we just need to know whether the pattern exists or not, we can use the logical function grepl() which returns either True or False based on the result. Let us learn more about the methods.



Finding and Replacing Strings

In order to search and replace a particular string, we can use two functions namely, sub() and gsub(). sub replaces the only first occurrence of the string to be replaced and returns the modified string. gsub(), on the other hand, replaces all occurrences of the string to be replaced and returns the modified string.

Syntax:
sub(pattern, replaced_string, string)
gsub(pattern, replaced_string, string)

Parameters:
pattern: A regular expressions pattern.
string: The vector to be searched for instance(s) of the pattern to be replaced.
ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

Example 1: To replace the first occurrence of ‘he’ with ‘aa’




str = "heutabhe"
sub('he', 'aa', str)

Output:

aautabhe

Example 2: To replace all occurrences of ‘he’ with ‘aa’




str = "heutabhe"
gsub('he', 'aa', str)

Output:

[1] "aautabaa"

Finding and Removing Strings

In order to search and remove a particular string/pattern, we can use two functions namely, str_remove() and str_remove_all(). str_remove() removes the only first occurrence of the string/pattern to be removed and returns the modified string. str_remove_all() on the other hand removes all occurrences of the string to be removes and returns the modified string.

Syntax:
str_remove(string, pattern, ignore.case=False)

Parameters:
pattern: A regular expressions pattern.
string: The character vector to be searched for instance(s) of the pattern to be removed.
ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

Example 1: Removing the first occurrence of vowels in the vector




library(stringr)
x <- c("apple", "pear", "banana")
str_remove(x, "[aeiou]")

Output:

[1] "pple"  "par"   "bnana"

Example 2: Removing all occurrences of vowels in the vector




library(stringr)
x <- c("apple", "pear", "banana")
str_remove_all(x, "[aeiou]")

Output:

[1] "ppl" "pr"  "bnn"

Article Tags :