Open In App

Count of occurrences of each prefix in a string using modified KMP algorithm

Last Updated : 13 Dec, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

Given a string S of size N, the task is to count the occurrences of all the prefixes of the given string S.

Examples:  

Input: S = “AAAA” 
Output: 
A occurs 4 times 
AA occurs 3 times. 
AAA occurs 2 times. 
AAAA occurs 1 times. 
Explanation: 
Below is the illustration of all the prefix: 
 

Input: S = “ABACABA” 
Output: 
A occurs 4 times 
AB occurs 2 times 
ABA occurs 2 times 
ABAC occurs 1 times 
ABACA occurs 1 times 
ABACAB occurs 1 times 
ABACABA occurs 1 times 
 

Naive Approach:  

  1. Traverse over all the prefixes in set P. Let the x be the prefix.
  2. Do a sliding window approach of size |x|.
  3. Check if the current sliding window on S is equal to x. If yes then increase the count[x] by 1.

Time complexity: O(N3
Auxiliary Space: O(N)

Efficient Approach: 
Use the LPS array (also called prefix_function) from the KMP algorithm
The prefix function for this string is defined as an array LPS of length N, where LPS[i] is the length of the longest proper prefix of the substring S[0…i] which is also a suffix of this substring. Let occ[i] denote the number of occurrences of the prefix of length i.

Below are the steps to implement this approach:  

  1. Compute the LPS array or prefix_function.
  2. For each value of the prefix function, first, count how many times it occurs in the LPS array.
  3. The length prefix i appears exactly ans[i] times, then this number must be added to the number of occurrences of its longest suffix that is also a prefix.
  4. In the end, add 1 to all the values of occ array, because of the original prefix that should be counted as well.

For example: 
LPS[i] denotes that in position i, a prefix of length = LPS[i] appears. And this is the longest prefix possible. But shorter prefixes can occur. 
For String S = “AAAA”, following are the prefixes: 
 

S[0..0] = A 
S[0..1] = AA 
S[0..2] = AAA 
S[0..3] = AAAA 

Initially:  

occ[A] = 0 
occ[AA] = 0 
occ[AAA] = 0 
occ[AAAA] = 0 

Step1: LPS Array of the following string denotes the length of the longest prefix which is also a suffix: 

LPS[1] denotes in string AA, A is a suffix and also a prefix as LPS[1] = 1 
LPS[2] denotes in string AAA, AA is a suffix and also a prefix as LPS[2] = 2 
LPS[3] denotes in string AAAA, AAA is a suffix and also a prefix as LPS[3] = 3

Step 2:Add these occurrences of prefixes as suffixes to the answer in the occ[] array:  

Values : Counted substrings 
occ[A] = 1 : S[1] 
occ[AA] = 1 : S[1..2] 
occ[AAA] = 1 : S[1..3] 
occ[AAAA] = 0 : NULL(as there is not a prefix “AAAA” which is also a suffix. 
 

Step 3: Now traverse the string in reverse order starting from “AAA” (as the last value will always be 0 since the complete string is not a proper prefix). 

Since, string “AAA” S[1..3] contains “AA” S[2..3] as well, which was not counted yet, therefore increment the occurrence of string “AA” in occ[“AA”] as occ[“AA”] += occ[“AAA”]. Below is the count for the same: 
Values : Counted substrings 
occ[A] = 1 : S[1] 
occ[AA] = 2 : S[1..2], S[2..3] 
occ[AAA] = 1 : S[1..3] 
occ[AAAA] = 0 : NULL 

Now string “AA” contains “A” as well, which was not counted yet, therefore increment the occurrence of string “A” in occ[“A”] as occ[“A”] += occ[“AA”]. Below is the count for the same: 
 

Values : Counted substrings 
occ[A] = 3 : S[1], S[2], S[3] 
occ[AA] = 2 : S[1..2], S[2..3] 
occ[AAA] = 1 : S[1..3] 
occ[AAAA] = 0 : NULL 

Step 4: At last add one to all occurrences for the original prefixes, which are not counted yet.  

Values : Counted substrings 
occ[A] = 4 : S[1], S[2], S[3], S[0] 
occ[AA] = 3 : S[1..2], S[2..3], S[0..1] 
occ[AAA] = 2 : S[1..3], S[0..2] 
occ[AAAA] = 1 : S[0..3]  

Below is the implementation of the above approach:

C++




// C++ program for the above approach
#include <bits/stdc++.h>
using namespace std;
 
// Function to print the count of all
// prefix in the given string
void print(vector<int>& occ, string& s)
{
    // Iterate over string s
    for (int i = 1; i <= int(s.size());
         i++) {
 
        // Print the prefix and their
        // frequency
        cout << s.substr(0, i)
             << " occurs "
             << occ[i]
             << " times."
             << endl;
    }
}
 
// Function to implement the LPS
// array to store the longest prefix
// which is also a suffix for every
// substring of the string S
vector<int> prefix_function(string& s)
{
    // Array to store LPS values
    vector<int> LPS(s.size());
 
    // Value of lps[0] is 0
    // by definition
    LPS[0] = 0;
 
    // Find the values of LPS[i] for
    // the rest of the string using
    // two pointers and DP
    for (int i = 1;
         i < int(s.size());
         i++) {
 
        // Initially set the value
        // of j as the longest
        // prefix that is also a
        // suffix for i as LPS[i-1]
        int j = LPS[i - 1];
 
        // Check if the suffix of
        // length j+1 is also a prefix
        while (j > 0 && s[i] != s[j]) {
            j = LPS[j - 1];
        }
 
        // If s[i] = s[j] then, assign
        // LPS[i] as j+1
        if (s[i] == s[j]) {
            LPS[i] = j + 1;
        }
 
        // If we reached j = 0, assign
        // LPS[i] as 0 as there was no
        // prefix equal to suffix
        else {
            LPS[i] = 0;
        }
    }
 
    // Return the calculated
    // LPS array
    return LPS;
}
 
// Function to count the occurrence
// of all the prefix in the string S
void count_occurrence(string& s)
{
    int n = s.size();
 
    // Call the prefix_function
    // to get LPS
    vector<int> LPS
        = prefix_function(s);
 
    // To store the occurrence of
    // all the prefix
    vector<int> occ(n + 1);
 
    // Count all the suffixes that
    // are also prefix
    for (int i = 0; i < n; i++) {
        occ[LPS[i]]++;
    }
 
    // Add the occurrences of
    // i to smaller prefixes
    for (int i = n - 1;
         i > 0; i--) {
        occ[LPS[i - 1]] += occ[i];
    }
 
    // Adding 1 to all occ[i] for all
    // the original prefix
    for (int i = 0; i <= n; i++)
        occ[i]++;
 
    // Function Call to print the
    // occurrence of all the prefix
    print(occ, s);
}
 
// Driver Code
int main()
{
    // Given String
    string A = "ABACABA";
 
    // Function Call
    count_occurrence(A);
    return 0;
}


Java




// Java program for
// the above approach
import java.util.*;
class GFG{
 
// Function to print the count
// of all prefix in the
// given String
static void print(int[] occ,
                  String s)
{
  // Iterate over String s
  for (int i = 1;
           i <= s.length() - 1; i++)
  {
    // Print the prefix and their
    // frequency
    System.out.print(s.substring(0, i) +
                     " occurs " + occ[i] +
                     " times." + "\n");
  }
}
 
// Function to implement the LPS
// array to store the longest prefix
// which is also a suffix for every
// subString of the String S
static int[] prefix_function(String s)
{
  // Array to store LPS values
  int []LPS = new int[s.length()];
 
  // Value of lps[0] is 0
  // by definition
  LPS[0] = 0;
 
  // Find the values of LPS[i] for
  // the rest of the String using
  // two pointers and DP
  for (int i = 1;
       i < s.length(); i++)
  {
    // Initially set the value
    // of j as the longest
    // prefix that is also a
    // suffix for i as LPS[i-1]
    int j = LPS[i - 1];
 
    // Check if the suffix of
    // length j+1 is also a prefix
    while (j > 0 &&
           s.charAt(i) != s.charAt(j))
    {
      j = LPS[j - 1];
    }
 
    // If s[i] = s[j] then, assign
    // LPS[i] as j+1
    if (s.charAt(i) == s.charAt(j))
    {
      LPS[i] = j + 1;
    }
 
    // If we reached j = 0, assign
    // LPS[i] as 0 as there was no
    // prefix equal to suffix
    else
    {
      LPS[i] = 0;
    }
  }
 
  // Return the calculated
  // LPS array
  return LPS;
}
 
// Function to count the occurrence
// of all the prefix in the String S
static void count_occurrence(String s)
{
  int n = s.length();
 
  // Call the prefix_function
  // to get LPS
  int[] LPS = prefix_function(s);
 
  // To store the occurrence of
  // all the prefix
  int []occ = new int[n + 1];
 
  // Count all the suffixes that
  // are also prefix
  for (int i = 0; i < n; i++)
  {
    occ[LPS[i]]++;
  }
 
  // Add the occurrences of
  // i to smaller prefixes
  for (int i = n - 1;
           i > 0; i--)
  {
    occ[LPS[i - 1]] += occ[i];
  }
 
  // Adding 1 to all occ[i] for all
  // the original prefix
  for (int i = 0; i <= n; i++)
    occ[i]++;
 
  // Function Call to print the
  // occurrence of all the prefix
  print(occ, s);
}
 
// Driver Code
public static void main(String[] args)
{
  // Given String
  String A = "ABACABA";
 
  // Function Call
  count_occurrence(A);
}
}
 
// This code is contributed by Princi Singh


Python3




# Python3 program for the above approach
 
# Function to print the count of all
# prefix in the given string
def Print(occ, s):
     
    # Iterate over string s
    for i in range(1, len(s) + 1):
 
        # Print the prefix and their
        # frequency
        print(s[0 : i], "occur", occ[i], "times.")
 
# Function to implement the LPS
# array to store the longest prefix
# which is also a suffix for every
# substring of the string S
def prefix_function(s):
 
    # Array to store LPS values
    # Value of lps[0] is 0
    # by definition
    LPS = [0 for i in range(len(s))]
     
    # Find the values of LPS[i] for
    # the rest of the string using
    # two pointers and DP
    for i in range(1, len(s)):
 
        # Initially set the value
        # of j as the longest
        # prefix that is also a
        # suffix for i as LPS[i-1]
        j = LPS[i - 1]
 
        # Check if the suffix of
        # length j+1 is also a prefix
        while (j > 0 and s[i] != s[j]):
            j = LPS[j - 1]
 
        # If s[i] = s[j] then, assign
        # LPS[i] as j+1
        if (s[i] == s[j]):
            LPS[i] = j + 1
             
        # If we reached j = 0, assign
        # LPS[i] as 0 as there was no
        # prefix equal to suffix
        else:
            LPS[i] = 0
 
    # Return the calculated
    # LPS array
    return LPS
 
# Function to count the occurrence
# of all the prefix in the string S
def count_occurrence(s):
     
    n = len(s)
 
    # Call the prefix_function
    # to get LPS
    LPS = prefix_function(s)
 
    # To store the occurrence of
    # all the prefix
    occ = [0 for i in range(n + 1)]
 
    # Count all the suffixes that
    # are also prefix
    for i in range(n):
        occ[LPS[i]] += 1
 
    # Add the occurrences of
    # i to smaller prefixes
    for i in range(n - 1, 0, -1):
        occ[LPS[i - 1]] += occ[i]
     
    # Adding 1 to all occ[i] for all
    # the original prefix
    for i in range(n + 1):
        occ[i] += 1
         
    # Function Call to print the
    # occurrence of all the prefix
    Print(occ, s)
 
# Driver Code
 
# Given String
A = "ABACABA"
 
# Function Call
count_occurrence(A)
 
# This code is contributed by avanitrachhadiya2155


C#




// C# program for
// the above approach
using System;
class GFG{
 
// Function to print the
// count of all prefix
// in the given String
static void print(int[] occ,
                  String s)
{
  // Iterate over String s
  for (int i = 1;
           i <= s.Length - 1; i++)
  {
    // Print the prefix and their
    // frequency
    Console.Write(s.Substring(0, i) + 
                  " occurs " + occ[i] + 
                  " times." + "\n");
  }
}
 
// Function to implement the LPS
// array to store the longest prefix
// which is also a suffix for every
// subString of the String S
static int[] prefix_function(String s)
{
  // Array to store LPS values
  int []LPS = new int[s.Length];
 
  // Value of lps[0] is 0
  // by definition
  LPS[0] = 0;
 
  // Find the values of LPS[i] for
  // the rest of the String using
  // two pointers and DP
  for (int i = 1;
           i < s.Length; i++)
  {
    // Initially set the value
    // of j as the longest
    // prefix that is also a
    // suffix for i as LPS[i-1]
    int j = LPS[i - 1];
 
    // Check if the suffix of
    // length j+1 is also a prefix
    while (j > 0 && s[i] != s[j])
    {
      j = LPS[j - 1];
    }
 
    // If s[i] = s[j] then,
    // assign LPS[i] as j+1
    if (s[i] == s[j])
    {
      LPS[i] = j + 1;
    }
 
    // If we reached j = 0, assign
    // LPS[i] as 0 as there was no
    // prefix equal to suffix
    else
    {
      LPS[i] = 0;
    }
  }
 
  // Return the calculated
  // LPS array
  return LPS;
}
 
// Function to count the occurrence
// of all the prefix in the String S
static void count_occurrence(String s)
{
  int n = s.Length;
 
  // Call the prefix_function
  // to get LPS
  int[] LPS = prefix_function(s);
 
  // To store the occurrence of
  // all the prefix
  int []occ = new int[n + 1];
 
  // Count all the suffixes that
  // are also prefix
  for (int i = 0; i < n; i++)
  {
    occ[LPS[i]]++;
  }
 
  // Add the occurrences of
  // i to smaller prefixes
  for (int i = n - 1;
           i > 0; i--)
  {
    occ[LPS[i - 1]] += occ[i];
  }
 
  // Adding 1 to all occ[i] for all
  // the original prefix
  for (int i = 0; i <= n; i++)
    occ[i]++;
 
  // Function Call to print the
  // occurrence of all the prefix
  print(occ, s);
}
 
// Driver Code
public static void Main(String[] args)
{
  // Given String
  String A = "ABACABA";
 
  // Function Call
  count_occurrence(A);
}
}
 
// This code is contributed by Amit Katiyar


Javascript




<script>
    // JavaScript program for the above approach
 
    // Function to print the count of all
    // prefix in the given string
    const print = (occ, s) => {
        // Iterate over string s
        for (let i = 1; i <= s.length; i++) {
 
            // Print the prefix and their
            // frequency
            document.write(`${s.substr(0, i)} occurs ${occ[i]} times.<br/>`);
        }
    }
 
    // Function to implement the LPS
    // array to store the longest prefix
    // which is also a suffix for every
    // substring of the string S
    const prefix_function = (s) => {
        // Array to store LPS values
        let LPS = new Array(s.length).fill(0);
 
        // Value of lps[0] is 0
        // by definition
        LPS[0] = 0;
 
        // Find the values of LPS[i] for
        // the rest of the string using
        // two pointers and DP
        for (let i = 1; i < s.length; i++) {
 
            // Initially set the value
            // of j as the longest
            // prefix that is also a
            // suffix for i as LPS[i-1]
            let j = LPS[i - 1];
 
            // Check if the suffix of
            // length j+1 is also a prefix
            while (j > 0 && s[i] != s[j]) {
                j = LPS[j - 1];
            }
 
            // If s[i] = s[j] then, assign
            // LPS[i] as j+1
            if (s[i] == s[j]) {
                LPS[i] = j + 1;
            }
 
            // If we reached j = 0, assign
            // LPS[i] as 0 as there was no
            // prefix equal to suffix
            else {
                LPS[i] = 0;
            }
        }
 
        // Return the calculated
        // LPS array
        return LPS;
    }
 
    // Function to count the occurrence
    // of all the prefix in the string S
    const count_occurrence = (s) => {
        let n = s.length;
 
        // Call the prefix_function
        // to get LPS
        let LPS = prefix_function(s);
 
        // To store the occurrence of
        // all the prefix
        let occ = new Array(n + 1).fill(0);
 
        // Count all the suffixes that
        // are also prefix
        for (let i = 0; i < n; i++) {
            occ[LPS[i]]++;
        }
 
        // Add the occurrences of
        // i to smaller prefixes
        for (let i = n - 1;
            i > 0; i--) {
            occ[LPS[i - 1]] += occ[i];
        }
 
        // Adding 1 to all occ[i] for all
        // the original prefix
        for (let i = 0; i <= n; i++)
            occ[i]++;
 
        // Function Call to print the
        // occurrence of all the prefix
        print(occ, s);
    }
 
    // Driver Code
 
    // Given String
    let A = "ABACABA";
 
    // Function Call
    count_occurrence(A);
 
    // This code is contributed by rakeshsahni
 
</script>


Output: 

A occurs 4 times.
AB occurs 2 times.
ABA occurs 2 times.
ABAC occurs 1 times.
ABACA occurs 1 times.
ABACAB occurs 1 times.
ABACABA occurs 1 times.

 

Time Complexity: O(N2) 
Auxiliary Space: O(N)



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads