Open In App

Stringr Package in R Programming

Last Updated : 11 Jul, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Character data plays a vital role in data analysis and manipulation using R programming. To facilitate these tasks, the Stringr package was developed by Hadley Wickham. This package offers a range of functions that help in working with character strings in R. The Stringr package simplifies the string manipulation process and enables easy extraction, replacement, and manipulation of substrings in R. In this article, we will delve into the different functions and features of the Stringr package and understand how to apply them while programming in R.

What is Stringr Package?

Have you ever found yourself working with character strings in R and struggling with complex string manipulation tasks? If yes, then the Stringr package might be the solution you need. As a part of the tidyverse ecosystem, the stringr package offers a range of functions for character string processing in R. The package is built on top of the stringi package, which provides efficient and dependable string processing functions.

One of the most significant advantages of the Stringr package is its consistent and user-friendly interface, which simplifies the string manipulation process in R. By offering intuitive functions, the package enables easier extraction, replacement, and manipulation of substrings in R Programming Language.

Installation of the Stringr Package:

The Stringr package can be installed from the CRAN repository using the install.packages() function.

install.packages("stringr")

Once installed, the package can be loaded into the R session using the library() function.

library(stringr)

Basic String Operations using Stringr Package

String Concatenation using str_c( )

String concatenation is the process of joining two or more strings together. The str_c() function in the Stringr package can be used to concatenate strings. The str_c() function is straightforward to use and offers a variety of parameters to customize the concatenation process. Here are some of the parameters you can use with the str_c() function:

Syntax:

str_c(…, sep, collapse, ignore_na, trim)

where,

  • . . . – This parameter represents the strings that need to be concatenated. You can specify any number of arguments separated by commas.
  • sep – This parameter specifies the separator to be used between the concatenated strings. The default separator is an empty string.
  • collapse – This parameter is used to collapse the output of str_c() into a single string. The default value is FALSE.
  • ignore_na – This parameter is used to ignore any missing values in the input strings. The default value is FALSE.
  • trim – This parameter is used to trim any whitespace characters from the beginning or end of each string before concatenating them. The default value is FALSE.

R




str_c("Geeks", " ", "For", " ", "Geeks")


Output:

"Geeks For Geeks"

Substring Extraction using str_sub( )

Substring extraction involves extracting a portion of a string. The str_sub() function in the Stringr package can be used to extract substrings. Here are some of the parameters you can use with the str_sub() function:

Syntax:

str_sub(string, start, end, step, keep_empty, simplify)

where,

  • string – This parameter represents the character string or vector of strings from which you want to extract the substring(s).
  • start – This parameter specifies the position of the first character in the substring(s) to be extracted. The default value is 1.
  • end – This parameter specifies the position of the last character in the substring(s) to be extracted. The default value is Inf.
  • step – This parameter specifies the step size to use when extracting substrings. The default value is 1.
  • keep_empty – This parameter specifies whether to keep empty strings when no substring is found in a given input string. The default value is TRUE.
  • simplify – This parameter specifies whether to simplify the output when only one substring is extracted from a vector of strings. The default value is FALSE.

R




str_sub("Geeks For Geeks", 1, 5)


Output:

"Geeks"

Character Replacement using str_replace( )

Character replacement involves replacing a substring in a string with another substring. The str_replace() function in the Stringr package can be used to replace substrings. Here are some of the parameters you can use with the str_replace() function:

Syntax:

str_replace(string, pattern, replacement, fixed, trim)

where,

  • string – This parameter represents the character string or vector of strings in which you want to replace the pattern(s).
  • pattern – This parameter specifies the pattern or regular expression to be replaced in the input string(s).
  • replacement – This parameter specifies the replacement string(s) to be substituted for the pattern(s) in the input string(s).
  • fixed – This parameter specifies whether to use fixed matching instead of regular expressions. The default value is FALSE.
  • trim – This parameter specifies whether to trim whitespace from the beginning and end of the replacement string(s). The default value is TRUE.

R




str_replace("Hello World", "World", "Universe")


Output:

"Hello Universe"

Pattern Matching using str_detect( )

Pattern matching involves finding a substring that matches a specific pattern. The str_detect() function in the Stringr package can be used to detect patterns in strings. Here are some of the parameters you can use with the str_detect() function:

Syntax:

str_detect(string, pattern, negate, collate, regex)

where,

  • string – This parameter represents the character string or vector of strings in which you want to detect the pattern(s).
  • pattern – This parameter specifies the pattern or regular expression to be detected in the input string(s).
  • negate – This parameter specifies whether to return the logical complement of the detection result. The default value is FALSE.
  • collate – This parameter specifies whether to use collation rules when matching the pattern(s). The default value is TRUE.
  • regex – This parameter specifies whether to use regular expressions when matching the pattern(s). The default value is TRUE.

R




str_detect("Geeks for Geeks", "Geek")


Output:

True

Regular Expressions using str_extract( ) and str_replace_all( )

Regular expressions are a powerful way to search and manipulate strings. The Stringr package provides a set of functions for working with regular expressions, including str_extract() and str_replace_all().

This function is used to extract the first occurrence of a pattern within a character string or vector of strings. The str_extract() function returns a character vector containing the extracted substring(s). Here are some of the parameters you can use with the str_extract() function:

Syntax:

str_extract(string, pattern, simplify, ignore_case, opts_regex)

where,

  • string – This parameter represents the character string or vector of strings from which you want to extract the pattern(s).
  • pattern – This parameter specifies the pattern or regular expression to be extracted from the input string(s).
  • simplify – This parameter specifies whether to simplify the result to a character matrix or vector. The default value is FALSE.
  • ignore_case – This parameter specifies whether to ignore the case when matching the pattern(s). The default value is FALSE.
  • opts_regex – This parameter specifies additional regular expression options to be used when matching the pattern(s).

R




str_extract("Hello 123 World", "\\d+")


Output:

"123"

Syntax:

str_replace_all(string, pattern, replacement, simplify)

where,

  • string – This parameter represents the character string or vector of strings in which you want to replace the pattern(s).
  • pattern – This parameter specifies the pattern or regular expression to be replaced in the input string(s).
  • replacement – This parameter specifies the string or regular expression to be used as the replacement for the pattern(s).
  • simplify – This parameter specifies whether to simplify the result to a character matrix or vector. The default value is FALSE.

his function is used to replace all occurrences of a pattern within a character string or vector of strings with another specified string or pattern. The str_replace_all() function returns a character vector containing the replaced string(s). Here are some of the parameters you can use with the str_replace_all() function:

R




str_replace_all("Hello 123 World",
                "\\d+", "999")


Output:

"Hello 999 World"

Advanced String Operations using Stringr Package:

String Splitting using str_split( )

String splitting involves splitting a string into multiple substrings based on a delimiter. The str_split() function in the Stringr package can be used to split strings. The str_split() function returns a list of character vectors containing the split substrings. Here are some of the parameters you can use with the str_split() function:

Syntax:

str_split(string, pattern, simplify, n, discard_empty)

where,

  • string – This parameter represents the character string or vector of strings to be split.
  • pattern – This parameter specifies the delimiter or regular expression pattern to be used for splitting the input string(s).
  • simplify – This parameter specifies whether to simplify the result to a character matrix or vector. The default value is FALSE.
  • n – This parameter specifies the maximum number of splits to be performed. The default value is ‘Inf’.
  • discard_empty – This parameter specifies whether to discard empty strings in the output. The default value is TRUE.

R




str_split("apple, orange, banana", ",")


Output:

"apple"  "orange" "banana"

String Padding using str_pad( )

String padding involves adding characters to a string to make it a certain length. The str_pad() function in the Stringr package can be used to pad strings. The str_pad() function returns a character vector containing the padded string(s). Here are some of the parameters you can use with the str_pad() function:

Syntax:

str_pad(string, width, side, pad, truncate)

where,

  • string – This parameter represents the character string or vector of strings to be padded.
  • width – This parameter specifies the desired width of the padded string(s).
  • side – This parameter specifies whether to pad the string(s) on the left (“left”), right (“right”), or both sides (“both”). The default value is “right”.
  • pad – This parameter specifies the character(s) to be used for padding. The default value is a single space character ” “.
  • truncate – This parameter specifies whether to truncate the string(s) if they exceed the specified width. The default value is FALSE.

R




str_pad("123", width = 5,
        side = "left", pad = "0")


Output:

"00123"

Case Conversion using str_to_upper( ) and str_to_lower( )

Case conversion involves converting the case of a string to upper or lower case. The str_to_upper() and str_to_lower() functions in the Stringr package can be used to convert case. The str_to_upper() function converts all letters in the input string(s) to uppercase, while the str_to_lower() function converts all letters in the input string(s) to lowercase. Both functions return a character vector containing the converted string(s). Here are the parameters you can use with these functions:

Syntax:

str_to_upper(string, locale) or str_to_lower(string, locale)

where,

  • string – This parameter represents the character string or vector of strings to be converted to upper or lower case.
  • locale – This parameter specifies the locale to be used for the conversion. The default value is NULL, which means that the default system locale will be used.

R




# For conversion to upper case
str_to_upper("hello world")
 
# For conversion to lower case
str_to_lower("HELLO WORLD")


Output:

"HELLO WORLD"

"hello world"

String Trimming using str_trim( )

String trimming involves removing whitespace characters from the beginning and/or end of a string. The str_trim() function in the Stringr package can be used to trim strings. The str_trim() function returns a character vector containing the trimmed string(s). Here are the parameters you can use with the str_trim() function:

Syntax:

str_trim(string, side, whitespace, pattern, fixed)

where,

  • string – This parameter represents the character string or vector of strings to be trimmed.
  • side – This parameter specifies which side(s) of the string(s) to trim. The possible values are “both”, “left”, or “right”. The default value is “both”.
  • whitespace – This parameter specifies which whitespace characters to remove. The default value is “[:space:]”, which matches any whitespace character.
  • pattern – This parameter allows you to specify a regular expression pattern to match characters to remove from the string(s). The default value is NULL, which means that the whitespace parameter is used to match characters to remove.
  • fixed – This parameter specifies whether to treat the pattern parameter as a fixed string (TRUE) or a regular expression (FALSE). The default value is FALSE.

R




str_trim("   GFG   ")


Output:

"GFG"

Conclusion

A stringr package is a powerful tool for working with character strings in R programming. It provides a set of functions that make it easier to manipulate and transform strings for data analysis and manipulation tasks. We have covered some of the most commonly used functions in the stringr package, such as str_c(), str_sub(), str_replace(), str_detect(), str_extract(), str_replace_all(), str_split(), str_pad(), str_to_upper(), str_to_lower(), and str_trim(). These functions offer a wide range of string manipulation capabilities, including concatenation, substring extraction, pattern matching, and case conversion. By using the stringr package, you can save time and effort when working with character strings in R.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads