R – Strings
Strings are a bunch of character variables. It is a one-dimensional array of characters. One or more characters enclosed in a pair of matching single or double quotes can be considered a string in R. Strings represent textual content and can contain numbers, spaces, and special characters. An empty string is represented by using “. Strings are always stored as double-quoted values in R. Double quoted string can contain single quotes within it. Single quoted strings can’t contain single quotes. Similarly, double quotes can’t be surrounded by double quotes.
Creation of String
Strings can be created by assigning character values to a variable. These strings can be further concatenated by using various functions and methods to form a big string.
String 1 is: OK1 String 2 is: OK2 String 3 is: This is 'acceptable and 'allowed' in R String 4 is: Hi, Wondering "if this "works" Error: unexpected symbol in " str5 <- 'hi, ' this" Execution halted
Length of String
The length of strings indicates the number of characters present in the string. The function str_length() belonging to the ‘string’ package or nchar() inbuilt function of R can be used to determine the length of strings in R.
Example 1: Using the str_length() function
Example 2: Using nchar() function
Accessing portions of a string
The individual characters of a string can be extracted from a string by using the indexing methods of a string. There are two R’s inbuilt functions in order to access both the single character as well as the substrings of the string.
substr() or substring() function in R extracts substrings out of a string beginning with the start index and ending with the end index. It also replaces the specified substring with a new set of characters.
substr(..., start, end) or substring(..., start, end)
Example 1: Using substr() function
If the starting index is equal to the ending index, the corresponding character of the string is accessed. In this case, the first character, ‘L’ is printed.
Example 2: Using substring() function
The number of characters in the string is 10. The first print statement prints the last character of the string, “e”, which is str. The second print statement prints the 11th character of the string, which doesn’t exist, but the code doesn’t throw an error and print “”, that is an empty character.
The following R code indicates the mechanism of String Slicing, where in the substrings of a string are extracted:
The first print statement prints the first four characters of the string. The second print statement prints the substring from the indexes 8 to 10, which is “ode”.
The string characters can be converted to upper or lower case by R’s inbuilt function toupper() which converts all the characters to upper case, tolower() which converts all the characters to lower case, and casefold(…, upper=TRUE/FALSE) which converts on the basis of the value specified to the upper argument. All these functions can take in as arguments multiple strings too. The time complexity of all the operations is O(number of characters in the string).
 "HI LEARN CODING"  "hi learn coding"  "HI LEARN CODING"
By default, the value of upper in casefold() function is set to FALSE. If we set it to TRUE, the string gets printed in upper case.
The characters, as well as substrings of a string, can be manipulated to new string values. The changes are reflected in the original string. In R, the string values can be updated in the following way:
substr (..., start, end) <- newstring substring (..., start, end) <- newstring
Multiple strings can be updated at once, with the start <= end.
- If the length of the substring is larger than the new string, only the portion of the substring equal to the length of the new string is replaced.
- If the length of the substring is smaller than the new string, the position of the substring is replaced with the corresponding new string values.