Open In App

How to count the number of sentences in a text in R

Last Updated : 26 Sep, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

A fundamental task in R that is frequently used in text analysis and natural language processing is counting the number of sentences in a text. Sentence counting is necessary for many applications, including language modelling, sentiment analysis, and text summarization. In this article, we’ll look at various techniques and R packages for quickly and correctly counting the amount of phrases in a given text using R.

Related Concepts :

  • Regular Expressions : Regular expression specifies pattern that is used to identify sentences .
  • Functions in R : Various string related functions will be used for counting sentences

Steps Required For Counting Sentences in R :

  • First we need to write R script in R Studio that will perform counting of sentences .
  • We will store our text in a variable as string .
  • Then we will use regular expression to match it with text to count sentences .
  • Now we will use below examples to get count of sentences .
  • Finally we will display the count of sentences on console .

Code for Counting Sentences in Text using stringr Package

R




text <- "This is R program for counting number of sentences in text.
This program is for GFG article . And it is using stringr package for counting."
 
sentences <- unlist(strsplit(text, "[.!?]"))
 
num_sentences <- length(sentences)
 
cat("Number of sentences using unlist and strsplit :", num_sentences)


Output:

Number of sentences using unlist and strsplit : 3

  • First we store text in text variable .
  • Then we use strsplit to split text using regular expression .
  • unlist() – on above split output to convert it to list and store it in sentences variable.
  • length() is used to find number of sentences in sentences variable.

Finally we use cat to display the sentence count as below. As there are 3 sentences in the text ending with full stop(.) the output will be 3 .

Counting Sentences in Text using R and strcount()

R




if (!require(stringr)) {
  install.packages("stringr")
  library(stringr)
}
 
text <- "This is R program for counting number of sentences in text.
This program is for GFG article .
And it is using stringr package for counting. And is it working ?"
 
sentence_pattern <- "[.!?]"
 
num_sentences <- str_count(text, sentence_pattern)
 
cat("Number of sentences using stringr :", num_sentences, "\n")


Output:

Number of sentences using stringr : 4 

  • First we install the stringr package if it is not installed and store text similarly as above in text variable.
  • Then we store our regular expression in sentence_pattern variable .
  • str_count() to count sentences by matching text on regular expression .

Finally we will display the sentence count using cat. Here in text there are four sentences in total 3 ending with full stop(.) and one ending with question mark(?) .Hence the output is 4

Code for Counting Sentences in Text using openNLP Package

R




if (!require(openNLP)) {
  install.packages("openNLP") #this will install the package if not present
  library(openNLP)
}
 
text <- "This is gfg sentence. Another sentence from gfg ! And a third one?"
 
 
sent_token_annotator <- Maxent_Sent_Token_Annotator()
sentences <- sent_token_annotator(text)
 
num_sentences <- length(sentences)
 
cat("Number of sentences using openNLP:", num_sentences, "\n")


Output:

Number of sentences using openNLP: 3

  • we store text in text variable .
  • Then we set data as “sent_token_english” which will load the model .
  • maxent sentence tokenizer to count number of sentences .
  • Finally we use length() to count length of sentences and we will display it using cat .
  • Make Sure you have JAVA installed and path is set to make this code work.

Here there are 3 sentences seperated by full stop(.) , exclamation mark(!) and question mark(?) respectively . Hence the output is 3.

Code for Counting Sentences in Text using tokenizers Package

R




if (!require(tokenizers)) {
  install.packages("tokenizers")
  library(tokenizers)
}
 
text <- "This is an example gfg sentence. Another gfg sentence! this is last example."
 
sentences <- unlist(tokenize_sentences(text))
 
num_sentences <- length(sentences)
 
cat("Number of sentences using tokenizers:", num_sentences, "\n")


Output:

Number of sentences using tokenizers: 3 

  • we store text data in text variable.
  • use tokenize_sentences() to tokenize text into sentences.
  • unlist() to list the sentences and store it in sentences .
  • length() to count sentences and display it using cat .

As there are three sentences in text variable . Two of them separated by full stop(.) and one of them separated by exclamation mark(!). The count is 3.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads