Open In App

How to count the number of sentences in a text in R

A fundamental task in R that is frequently used in text analysis and natural language processing is counting the number of sentences in a text. Sentence counting is necessary for many applications, including language modelling, sentiment analysis, and text summarization. In this article, we’ll look at various techniques and R packages for quickly and correctly counting the amount of phrases in a given text using R.

Related Concepts :

Steps Required For Counting Sentences in R :

Code for Counting Sentences in Text using stringr Package




text <- "This is R program for counting number of sentences in text.
This program is for GFG article . And it is using stringr package for counting."
 
sentences <- unlist(strsplit(text, "[.!?]"))
 
num_sentences <- length(sentences)
 
cat("Number of sentences using unlist and strsplit :", num_sentences)

Output:



Number of sentences using unlist and strsplit : 3

Finally we use cat to display the sentence count as below. As there are 3 sentences in the text ending with full stop(.) the output will be 3 .

Counting Sentences in Text using R and strcount()




if (!require(stringr)) {
  install.packages("stringr")
  library(stringr)
}
 
text <- "This is R program for counting number of sentences in text.
This program is for GFG article .
And it is using stringr package for counting. And is it working ?"
 
sentence_pattern <- "[.!?]"
 
num_sentences <- str_count(text, sentence_pattern)
 
cat("Number of sentences using stringr :", num_sentences, "\n")

Output:



Number of sentences using stringr : 4 

Finally we will display the sentence count using cat. Here in text there are four sentences in total 3 ending with full stop(.) and one ending with question mark(?) .Hence the output is 4

Code for Counting Sentences in Text using openNLP Package




if (!require(openNLP)) {
  install.packages("openNLP") #this will install the package if not present
  library(openNLP)
}
 
text <- "This is gfg sentence. Another sentence from gfg ! And a third one?"
 
 
sent_token_annotator <- Maxent_Sent_Token_Annotator()
sentences <- sent_token_annotator(text)
 
num_sentences <- length(sentences)
 
cat("Number of sentences using openNLP:", num_sentences, "\n")

Output:

Number of sentences using openNLP: 3

Here there are 3 sentences seperated by full stop(.) , exclamation mark(!) and question mark(?) respectively . Hence the output is 3.

Code for Counting Sentences in Text using tokenizers Package




if (!require(tokenizers)) {
  install.packages("tokenizers")
  library(tokenizers)
}
 
text <- "This is an example gfg sentence. Another gfg sentence! this is last example."
 
sentences <- unlist(tokenize_sentences(text))
 
num_sentences <- length(sentences)
 
cat("Number of sentences using tokenizers:", num_sentences, "\n")

Output:

Number of sentences using tokenizers: 3 

As there are three sentences in text variable . Two of them separated by full stop(.) and one of them separated by exclamation mark(!). The count is 3.


Article Tags :