Open In App

How to calculate the number of occurrences of a character in each row of R DataFrame ?

Last Updated : 18 Apr, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to calculate the number of occurrences of a given character in each row in DataFrame in R Programming Language.

Method 1: Using stringr package

The stringr package in R programming language can be used to perform string manipulations and extraction, which can be installed into the working space.

The str_count() method is used to return the matching of the specified pattern in the vector of strings. It returns an integer vector of the number of instances of the pattern found in the input argument vector. The str_count() method is case-sensitive. 

Syntax:

str_count(str, pattern = “”)

Parameter : 

  • str – The vector of strings or a single string to search for the pattern
  • pattern – The pattern to be searched for. Usually a regular expression.

The pattern may be a single character or a group of characters stacked together. It may even contain special symbols or digits. In case, the pattern is not found, an integer value of 0 is returned. 

Example:

R




# loading the reqd library
library ("stringr")
 
# creating a data frame
data_frame <- data.frame(
  col1 = c(1:5), col2 = c("Geeks","for","geeks","CSE","portal"))
 
# character to search for
ch <- "e"
 
# counting the occurrences of character
count <- str_count(data_frame$col2, ch)
print ("Count of e :")
print (count)


Output

[1] “Count of e :” 

[1] 2 0 2 0 0

Method 2: Using grepexpr method

The gregexpr() method of base R is used to indicate where a pattern is located within a specified character vector. It is used to return a vector of vectors of the starting positions of the matching w.r.t each component of the input character array. The returned vector’s length is equivalent to the length of the original string vector. 

Syntax:

gregexpr(pattern, str, ignore.case=FALSE)

Parameter :

  • str – The vector of strings or a single string to search for the pattern
  • pattern – The pattern to be searched for. Usually a regular expression.
  • ignore.case – Indicator to ignore case or not

Here, the pattern is the character to search for and the str is the column of strings to look the pattern in.  The regmatches() method is applied over the output of this function, which is used to extract or replace the matched substrings from the matched data. In case, no match of the substring pattern is found, empty string is returned. 

Syntax:

regmatches(str, m)

Parameter : 

  • m – The output vector from the matched data. 

This is followed by the application of lengths() method, which returns the length of each substring component from the regmatches() vector. 

Example:

R




# creating a data frame
data_frame <- data.frame(
  col1 = c(1:5), col2 = c("!?contains","do!es!nt",
                          "Contain","cs!!!e","circus?"))
 
print ("Original DataFrame")
print (data_frame)
 
# character to search for
ch <- "!"
count <- regmatches(
  data_frame$col2, gregexpr(ch, data_frame$col2))
 
print ("Count of !")
 
# returning the number of occurrences
lengths(count)


Output

[1] “Original DataFrame” 

    col1       col2 

1    1 !?contains 

2    2   do!es!nt 

3    3    Contain 

4    4     cs!!!e 

5    5    circus? 

[1] “Count of !” 

[1] 1 2 0 3 0

Method 3: Using sapply method

  • The sapply() method in R is used to apply a user-defined function over the specified input vector taken as the first argument. The user-defined function, in this case, consists of a sequence of steps :

Syntax:

sapply ( x , fun)

  • strsplit() method is applied to split each component of the input vector into components based on ” ” delimiter. It is useful in case a string consists of multiple words. It returns an array of words in each element of the column.
  • The unlist() method is then applied to each word in a vector of letters, and check if each letter is equivalent to the character we wish to search for. The sum() method is then applied to increment the count each time a match is found.

Syntax:

sum ( unlist( str) == ch)

Example:

R




# creating a data frame
data_frame <- data.frame(
  col1 = c(1:5), col2 = c("!?contains","do!es!nt",
                          "Contain","cs!!!e","circus?"))
 
print ("Original DataFrame")
print (data_frame)
 
# character to search for
ch <- "!"
count <- sapply(as.character(data_frame$col2),
                function(x, letter = ch){
  str <- strsplit(x, split = "")
  sum(unlist(str) == letter)
})
print ("Count of !")
 
# returning the number of occurrences
print(count)


Output

[1] “Original DataFrame” 

    col1       col2 

1    1 !?contains 

2    2   do!es!nt 

3    3    Contain 

4    4     cs!!!e 

5    5    circus?

[1] “Count of !” 

!?contains   do!es!nt    Contain     cs!!!e    circus?           

         1          2          0          3          0 



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads