Skip to content
Related Articles

Related Articles

Improve Article
Working with Text in R
  • Last Updated : 10 May, 2020

R is a programming language used for statistical – computing. R is used by many data miners and statisticians for developing statistical software and data analysis. It includes machine learning algorithms, linear regression, time series, statistical inference to name a few. R and its libraries implement a wide variety of statistical and graphical techniques, including linear and non-linear modeling, classical, statistical tests, time-series analysis, classification, clustering, and others.

Any value written inside the double quote is treated as a string in R. String is an array of characters and these collections of characters are stored inside a variable. Internally R stores every string within double quotes, even when you create them with a single quote.

Syntax:

Variable_name <- "String"

Example:




# R program to demonstrate
# creation of a string
a < -"hello world" print(a)

Output:



"hello world"

Following is a list of rules that need to be followed while working with strings:

  • The quotes at the beginning and end of a string should be both double quotes or both single quote. They can not be mixed.
  • Double quotes can be inserted into a string starting and ending with a single quote.
  • A single quote can be inserted into a string starting and ending with double-quotes.

String Manipulation

String manipulation is a process where a user is asked to process a given string and use/change its data. There are different methods in R to manipulate string that are as follows:

  • Concatenating of strings – paste() function:

    paste() function is used to combine string in R. It can take n number of arguments to combine together.

    Syntax:

    paste(….,  sep = " ",  collapse =NULL )

    Parameters:
    .....: It is used to pass n no of arguments to combine together.
    sep: It is used to represent the separator between the arguments. It is optional.
    collapse: It is used to remove the space between 2 strings, But not space within two words in one string.

    Example:




    # concatenate two strings
    str1 <- "hello" 
    str2 <- "how are you?" 
    print(paste(str1, str2, sep = " ", collapse = "NULL"))

    Output:

    "hello how are you?"
    

     



  • Formatting numbers and string – format() function:

    format() function is used to format strings and numbers in a specified style.

    Syntax:

    format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none")) 

    Parameters:

    x is the vector input.
    digits here is the total number of digits displayed.
    nsmall is the minimum number of digits to the right of the decimal point.
    scientific is set to TRUE to display scientific notation.
    width indicates the minimum width to be displayed by padding blanks in the beginning.
    justify is the display of the string to left, right, or center.

    Example:




    # formatting numbers and strings
      
    # Total number of digits displayed.
    # Last digit rounded off.
    result <- format(69.145656789, digits = 9)
    print(result)
      
    # Display numbers in scientific notation.
    result <- format(c(3, 132.84521), 
                     scientific = TRUE)
    print(result)
      
    # The minimum number of digits 
    # to the right of the decimal point.
    result <- format(96.47, nsmall = 5)
    print(result)
      
    # Format treats everything as a string.
    result <- format(8)
    print(result)
      
    # Numbers are padded with blank
    # in the beginning for width.
    result <- format(67.7, width = 6)
    print(result)
      
    # Left justify strings.
    result <- format("Hello", width = 8
                              justify = "l")
    print(result)

    Output:

    [1] "69.1456568"
    [1] "3.000000e+00" "1.328452e+02"
    [1] "96.47000"
    [1] "8"
    [1] "  67.7"
    [1] "Hello   "
    

     

  • Counting the number of characters in the string – nchar() function:

    nchar() function is used to count the number of characters and spaces in the string.

    Syntax: nchar(x)

    Parameter:
    x is the vector input here.



    Example:




    # to count the number of characters
    # in the string
    a <- nchar("hello world")
    print(a)

    Output:

    [1] 11
    

     

  • Changing the case of the string – toupper() & tolower() function:

    toupper() & tolower() function is used to change the case of the string.

    Syntax:
    toupper(x)
    and
    tolower(x)

    Parameter:

    x is the vector input

    Example:




    # Changing to Upper case.
    a <- toupper("hello world")
    print(a)
      
    # Changing to lower case.
    b <- tolower("HELLO WORLD")
    print(b)

    Output:

    "HELLO WORLD"
    "hello world"
    

     

  • Extracting parts of the string – substring() function:
    substring() function is used to extract parts of the string.

    Syntax: substring(x, first, last)

    Parameters:
    x is the character vector input.
    first is the position of the first character to be extracted.
    last is the position of the last character to be extracted.

    Example:




    # Extract characters from 1th to 3rd position.
    c <- substring("Programming", 1, 3)
    print(c)

    Output:

    "Pro"
    
My Personal Notes arrow_drop_up
Recommended Articles
Page :