Skip to content
Related Articles

Related Articles

Improve Article

String Manipulation in R

  • Last Updated : 22 Apr, 2020

String manipulation basically refers to the process of handling and analyzing strings. It involves various operations concerned with modification and parsing of strings to use and change its data. R offers a series of in-built functions to manipulate the contents of a string. In this article, we will study different functions concerned with the manipulation of strings in R.

Concatenation of Strings

String Concatenation is the technique of combining two strings. String Concatenation can be done using many ways:

  • paste() function
    Any number of strings can be concatenated together using the paste() function to form a larger string. This function takes separator as argument which is used between the individual string elements and another argument ‘collapse’ which reflects if we wish to print the strings together as a single larger string. By default, the value of collapse is NULL.
    Syntax:
    paste(..., sep=" ", collapse = NULL)

    Example:






    # R program for String concatenation
      
    # Concatenation using paste() function
    str <- paste("Learn", "Code")
    print (str)

    Output:

     "Learn Code"

    In case no separator is specified the default separator ” ” is inserted between individual strings.

    Example:




    str <- paste(c(1:3), "4", sep = ":")
    print (str)

    Output:

    "1:4" "2:4" "3:4"

    Since, the objects to be concatenated are of different lengths, a repetition of the string of smaller length is applied with the other input strings. The first string is a sequence of 1, 2, 3 which is then individually concatenated with the other string “4” using separator ‘:’.




    str <- paste(c(1:4), c(5:8), sep = "--")
    print (str)

    Output:

    "1--5" "2--6" "3--7" "4--8"

    Since, both the strings are of the same length, the corresponding elements of both are concatenated, that is the first element of the first string is concatenated with the first element of second-string using the sep '–'.

  • cat() function
    Different types of strings can be concatenated together using the cat()) function in R, where sep specifies the separator to give between the strings and file name, in case we wish to write the contents onto a file.
    Syntax:



    cat(..., sep=" ", file)

    Example:




    # R program for string concatenation
      
    # Concatenation using cat() function
    str <- cat("learn", "code", "tech", sep = ":")
    print (str)

    Output:

    learn:code:techNULL

    The output string is printed without any quotes and the default separator is ‘:’.NULL value is appended at the end.
    Example:




    cat(c(1:5), file ='sample.txt')

    Output:

    1 2 3 4 5

The output is written to a text file sample.txt in the same working directory.

Calculating Length of strings

  • length() function
    The length() function determines the number of strings specified in the function.
    Example:




    # R program to calculate length
      
    print (length(c("Learn to", "Code")))

    Output:

    2

    There are two strings specified in the function.

  • nchar() function
    nchar() counts the number of characters in each of the strings specified as arguments to the function individually.
    Example:






    print (nchar(c("Learn", "Code")))

    Output:

    5 4

    The output indicates the length of Learn and then Code separated by ” ” .

Case Conversion of strings

  • Conversion to upper case
    All the characters of the strings specified are converted to upper case.
    Example:




    print (toupper(c("Learn Code", "hI")))

    Output :

    "LEARN CODE" "HI"

  • Conversion to lower case
    All the characters of the strings specified are converted to lower case.
    Example:




    print (tolower(c("Learn Code", "hI")))

    Output :

    "learn code" "hi"

  • casefold() function
    All the characters of the strings specified are converted to lowercase or uppercase according to the arguments in casefold(…, upper=TRUE).
    Examples:




    print (casefold(c("Learn Code", "hI")))

    Output:



    "learn code" "hi"

    By default, the strings get converted to lower case.




    print (casefold(c("Learn Code", "hI"), upper = TRUE))

    Output:

    "LEARN CODE" "HI"

Character replacement

Characters can be translated using the chartr(oldchar, newchar, …) function in R, where every instance of old character is replaced by the new character in the specified set of strings.
Example 1:




chartr("a", "A", "An honest man gave that")

Output:

"An honest mAn gAve thAt"

Every instance of ‘a’ is replaced by ‘A’.
Example 2:




chartr("is", "#@", c("This is it", "It is great"))

Output:

"Th#@ #@ #t"  "It #@ great"

Every instance of old string is replaced by new specified string. “i” is replaced by “#” by “s” by “@”, that is the corresponding positions of old string is replaced by new string.
Example 3:




chartr("ate", "#@", "I hate ate")

Output:

Error in chartr("ate", "#@", "I hate ate") : 'old' is longer than 'new'
         Execution halted 

The length of the old string should be less than the new string.

Splitting the string

A string can be split into corresponding individual strings using ” ” the default separator.
Example:






strsplit("Learn Code Teach !", " ")

Output:

[1] "Learn" "Code"  "Teach" "!"

Working with substrings

substr(…, start, end) or substring(…, start, end) function in R extracts substrings out of a string beginning with the start index and ending with the end index. It also replaces the specified substring with a new set of characters.
Example:




substr("Learn Code Tech", 1, 4)

Output:

"Lear"

Extracts the first four characters from the string.




str <- c("program", "with", "new", "language")
substring(str, 3, 3) <- "%"
print(str)

Output:

"pr%gram"  "wi%h"     "ne%"      "la%guage"

Replaces the third character of every string with % sign.




str <- c("program", "with", "new", "language")
substr(str, 3, 3) <- c("%", "@")
print(str)

Output:

"pr%gram"  "wi@h"     "ne%"      "la@guage"

Replaces the third character of each string alternatively with the specified symbols.




My Personal Notes arrow_drop_up
Recommended Articles
Page :