Open In App

Count Number of Words in String using R

Last Updated : 03 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to see how to count the number of words in character String in R Programming Language. 

Method 1: Using strplit and sapply methods

The strsplit() method in R is used to return a vector of words contained in the specified string based on matching with regex defined. Each element of this vector is a substring of the original string. The length of the returned vector is therefore equivalent to the number of words. 

Syntax: strsplit( str , regex )

Arguments :

  • str – The string to count the occurrences of
  • regex – The character vector (or object which can be coerced ) containing regular expression, for the pattern to be matched. In the case of finding the number of words the pattern is simply equivalent to ” “.

sapply() method: It is used to compute the length of the vector containing words. The sapply() method is used to apply functions over vectors or lists, and return outputs based on these computations. In case the second argument, that is, the function is length, then the length of the split vector is returned. 

sapply (str , FUN)

The combined approach to determine the composite words is defined by the following syntax in R :

sapply(strsplit(str, " "), length)

Code:

R




# declaring string
str <- "Counting the words in this R sentence?\
Try this approach in GFG! "
 
print ("Original string")
print (str)
print ("Total number of words")
 
# splitting a string by spaces
split <- strsplit(str, " ")
sapply( split , length)


Output

[1] "Original string"
[1] "Counting the words in this R sentence? Try this approach in GFG! "
[1] "Total number of words"
[1] 12

The time complexity of the given code is O(n), where n is the number of words in the input string. 

The auxiliary space is also O(n), where n is the number of words in the input string

Method 2: Using gregexpr method.

This method uses a variety of methods available in base R to compute the number of occurrences of a specific character in R. The gregexpr() method is used to return a list of sublists that match a specific pattern of the argument list of the function. The pattern matching used is case-sensitive in this case. The pattern in our case is \\W+

Syntax: gregexpr(pattern, text)

The lengths method is then applied in order to return the individual lengths of all the elements of the argument vector.

Syntax: lengths(x)

This method uses the regular expression symbol \\W to match non-word characters, using + to indicate one or more in a row. It returns the number of separators between the words, so the number of words is actually, separators + 1, in most cases.

Code:

R




# declaring string
str <- "Counting the words in this R sentence? \
Try this approach in GFG! "
 
print ("Original string")
print (str)
print ("Total number of words")
 
# splitting a string by spaces
lengths(gregexpr("\\W+", str)) + 1 


Output

[1] "Original string"
[1] "Counting the words in this R sentence? Try this approach in GFG! "
[1] "Total number of words"
[1] 13

Method 3: Using stringr package 

The stringR package in R is used to perform string manipulations. It needs to be explicitly installed in the working space to access its methods and routines.

install.packages("stringr")

The stringr package provides a str_count() method which is used to count the number of occurrences of a certain pattern specified as an argument to the function. The pattern may be a single character or a group of characters. Any instances matching the expression result in the increment of the count. This method can also be invoked over a vector of strings, and an individual count vector is returned containing individual counts of the number of pattern matches found. However, this method is only considered approximate of regex matching. In case, no matches are found 0 is returned.

Syntax: str_count(str, pattern = “”)

Arguments :

  • str – The string to count the occurrences of
  • pattern – the pattern to match to

Code:

R




library("stringr")
 
# declaring string
str <- "Counting the words in this R sentence? Try this approach in GFG! "
print ("Original string")
print (str)
print ("Total number of words")
 
# splitting a string by spaces
str_count(str ,"\\W+")


Output:

[1] "Original string"
[1] "Counting the words in this R sentence? Try this approach in GFG! "
[1] "Total number of words"
[1] 12


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads