Open In App

String Manipulation in R

Last Updated : 12 Jan, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

String manipulation basically refers to the process of handling and analyzing strings. It involves various operations concerned with modification and parsing of strings to use and change its data. R offers a series of in-built functions to manipulate the contents of a string. In this article, we will study different functions concerned with the manipulation of strings in R.

Concatenation of Strings

String Concatenation is the technique of combining two strings. String Concatenation can be done using many ways:

  • paste() function Any number of strings can be concatenated together using the paste() function to form a larger string. This function takes separator as argument which is used between the individual string elements and another argument ‘collapse’ which reflects if we wish to print the strings together as a single larger string. By default, the value of collapse is NULL. Syntax:
paste(..., sep=" ", collapse = NULL)
  • Example: 

Python3




# R program for String concatenation
 
# Concatenation using paste() function
str <- paste("Learn", "Code")
print (str)


  • Output:
 "Learn Code"
  • In case no separator is specified the default separator ” ” is inserted between individual strings. Example: 

Python3




str <- paste(c(1:3), "4", sep = ":")
print (str)


  • Output: 
"1:4" "2:4" "3:4"
  • Since, the objects to be concatenated are of different lengths, a repetition of the string of smaller length is applied with the other input strings. The first string is a sequence of 1, 2, 3 which is then individually concatenated with the other string “4” using separator ‘:’. 

Python3




str <- paste(c(1:4), c(5:8), sep = "--")
print (str)


  • Output:
"1--5" "2--6" "3--7" "4--8"
  • Since, both the strings are of the same length, the corresponding elements of both are concatenated, that is the first element of the first string is concatenated with the first element of second-string using the sep ‘–‘.
  • cat() function Different types of strings can be concatenated together using the cat()) function in R, where sep specifies the separator to give between the strings and file name, in case we wish to write the contents onto a file. Syntax:
cat(..., sep=" ", file)
  • Example: 

Python3




# R program for string concatenation
 
# Concatenation using cat() function
str <- cat("learn", "code", "tech", sep = ":")
print (str)


  • Output:
learn:code:techNULL
  • The output string is printed without any quotes and the default separator is ‘:’.NULL value is appended at the end. Example: 

Python3




cat(c(1:5), file ='sample.txt')


  • Output:
1 2 3 4 5

The output is written to a text file sample.txt in the same working directory.

Calculating Length of strings

  • length() function The length() function determines the number of strings specified in the function. Example: 

Python3




# R program to calculate length
 
print (length(c("Learn to", "Code")))


  • Output:
2
  • There are two strings specified in the function.
  • nchar() function nchar() counts the number of characters in each of the strings specified as arguments to the function individually. Example: 

Python3




print (nchar(c("Learn", "Code")))


  • Output: 
5 4
  • The output indicates the length of Learn and then Code separated by ” ” .

Case Conversion of strings

  • Conversion to upper case All the characters of the strings specified are converted to upper case. Example: 

Python3




print (toupper(c("Learn Code", "hI")))


  • Output :
"LEARN CODE" "HI"
  • Conversion to lower case All the characters of the strings specified are converted to lower case. Example: 

Python3




print (tolower(c("Learn Code", "hI")))


  • Output : 
"learn code" "hi"
  • casefold() function All the characters of the strings specified are converted to lowercase or uppercase according to the arguments in casefold(…, upper=TRUE). Examples: 

Python3




print (casefold(c("Learn Code", "hI")))


  • Output: 
"learn code" "hi"
  • By default, the strings get converted to lower case. 

Python3




print (casefold(c("Learn Code", "hI"), upper = TRUE))


  • Output: 
"LEARN CODE" "HI"

Character replacement

Characters can be translated using the chartr(oldchar, newchar, …) function in R, where every instance of old character is replaced by the new character in the specified set of strings. Example 1: 

Python3




chartr("a", "A", "An honest man gave that")


Output:

"An honest mAn gAve thAt"

Every instance of ‘a’ is replaced by ‘A’. Example 2: 

Python3




chartr("is", "#@", c("This is it", "It is great"))


Output: 

"Th#@ #@ #t"  "It #@ great"

Every instance of old string is replaced by new specified string. “i” is replaced by “#” by “s” by “@”, that is the corresponding positions of old string is replaced by new string. Example 3: 

Python3




chartr("ate", "#@", "I hate ate")


Output:

Error in chartr("ate", "#@", "I hate ate") : 'old' is longer than 'new'
         Execution halted 

The length of the old string should be less than the new string.

Splitting the string

A string can be split into corresponding individual strings using ” ” the default separator. Example: 

Python3




strsplit("Learn Code Teach !", " ")


Output:

[1] "Learn" "Code"  "Teach" "!"

Working with substrings

substr(…, start, end) or substring(…, start, end) function in R extracts substrings out of a string beginning with the start index and ending with the end index. It also replaces the specified substring with a new set of characters. Example: 

Python3




substr("Learn Code Tech", 1, 4)


Output: 

"Lear"

Extracts the first four characters from the string. 

Python3




str & lt
- c(& quot
     program", & quot
     with"
     , & quot
     new"
     , & quot
     language"
     )
substr(str, 3, 3) & lt
- & quot
% & quot
print(str)


Output: 

"pr%gram"  "wi%h"     "ne%"      "la%guage"

Replaces the third character of every string with % sign. 

Python3




str <- c("program", "with", "new", "language")
substr(str, 3, 3) <- c("%", "@")
print(str)


Output: 

"pr%gram"  "wi@h"     "ne%"      "la@guage"

Replaces the third character of each string alternatively with the specified symbols.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads