Skip to content
Related Articles

Related Articles

Improve Article

Count Number of Words in String using R

  • Last Updated : 23 May, 2021
Geek Week

In this article, we are going to see how to count the number of words in character String in R Programming Language. 

Method 1: Using strplit and sapply methods

The strsplit() method in R is used to return a vector of words contained in the specified string based on matching with regex defined. Each element of this vector is a substring of the original string. The length of the returned vector is therefore equivalent to the number of words. 

Syntax: strsplit( str , regex )

Arguments :



  • str – The string to count the occurrences of
  • regex – The character vector (or object which can be coerced ) containing regular expression, for the pattern to be matched. In the case of finding the number of words the pattern is simply equivalent to ” “.

sapply() method: It is used to compute the length of the vector containing words. The sapply() method is used to apply functions over vectors or lists, and return outputs based on these computations. In case the second argument, that is, the function is length, then the length of the split vector is returned. 

sapply (str , FUN)

The combined approach to determine the composite words is defined by the following syntax in R :

sapply(strsplit(str, " "), length)

Code:

R




# declaring string
str <- "Counting the words in this R sentence?\
Try this approach in GFG! "
  
print ("Original string")
print (str)
print ("Total number of words")
  
# splitting a string by spaces
split <- strsplit(str, " ")
sapply( split , length)

Output

[1] "Original string"
[1] "Counting the words in this R sentence? Try this approach in GFG! "
[1] "Total number of words"
[1] 12

Method 2: Using gregexpr method.

This method uses a variety of methods available in base R to compute the number of occurrences of a specific character in R. The gregexpr() method is used to return a list of sublists that match a specific pattern of the argument list of the function. The pattern matching used is case-sensitive in this case. The pattern in our case is \\W+

Syntax: gregexpr(pattern, text)



The lengths method is then applied in order to return the individual lengths of all the elements of the argument vector.

Syntax: lengths(x)

This method uses the regular expression symbol \\W to match non-word characters, using + to indicate one or more in a row. It returns the number of separators between the words, so the number of words is actually, separators + 1, in most cases.

Code:

R




# declaring string
str <- "Counting the words in this R sentence? \
Try this approach in GFG! "
  
print ("Original string")
print (str)
print ("Total number of words")
  
# splitting a string by spaces
lengths(gregexpr("\\W+", str)) + 1  

Output

[1] "Original string"
[1] "Counting the words in this R sentence? Try this approach in GFG! "
[1] "Total number of words"
[1] 13

Method 3: Using stringr package 

The stringR package in R is used to perform string manipulations. It needs to be explicitly installed in the working space to access its methods and routines.

install.packages("stringr")

The stringr package provides a str_count() method which is used to count the number of occurrences of a certain pattern specified as an argument to the function. The pattern may be a single character or a group of characters. Any instances matching the expression result in the increment of the count. This method can also be invoked over a vector of strings, and an individual count vector is returned containing individual counts of the number of pattern matches found. However, this method is only considered approximate of regex matching. In case, no matches are found 0 is returned.

Syntax: str_count(str, pattern = “”)

Arguments :

  • str – The string to count the occurrences of
  • pattern – the pattern to match to

Code:

R




library("stringr")
  
# declaring string
str <- "Counting the words in this R sentence? Try this approach in GFG! "
print ("Original string")
print (str)
print ("Total number of words")
  
# splitting a string by spaces
str_count(str ,"\\W+"

Output:

[1] "Original string"
[1] "Counting the words in this R sentence? Try this approach in GFG! "
[1] "Total number of words"
[1] 12



My Personal Notes arrow_drop_up
Recommended Articles
Page :