Count of occurrences of each prefix in a string using modified KMP algorithm

Given a string S of size N, the task is to count the occurrences of all the prefixes of the given string S.

Examples:

Input: S = “AAAA”
Output:
A occurs 4 times
AA occurs 3 times.
AAA occurs 2 times.
AAAA occurs 1 times.
Explanation:
Below is the illustration of all the prefix:

Input: S = “ABACABA”
Output:
A occurs 4 times
AB occurs 2 times
ABA occurs 2 times
ABAC occurs 1 times
ABACA occurs 1 times
ABACAB occurs 1 times
ABACABA occurs 1 times

Naive Approach:



  1. Traverse over all the prefixes in set P. Let the x be the prefix.
  2. Do a sliding window approach of size |x|.
  3. Check if the current sliding window on S is equal to x. If yes then increase the count[x] by 1.

Time complexity: O(N3)
Auxiliary Space: O(N)

Efficient Approach:
Use the LPS array (also called prefix_function) from KMP algorithm.

The prefix function for this string is defined as an array LPS of length N, where LPS[i] is the length of the longest proper prefix of the substring S[0…i] which is also a suffix of this substring. Let occ[i] denote the number of occurrences of prefix of length i.

Below are the steps to implement this approach:

  1. Compute the LPS array or prefix_function.
  2. For each value of the prefix function, first count how many times it occurs in the LPS array.
  3. The length prefix i appears exactly ans[i] times, then this number must be added to the number of occurrences of its longest suffix that is also a prefix.
  4. At the end, add 1 to all the values of occ array, because of the original prefix that should be counted as well.

For example:
LPS[i] denotes that in position i, a prefix of length = LPS[i] appears. And this is the longest prefix possible. But shorter prefixes can occur.
For String S = “AAAA”, following are the prefixes:

S[0..0] = A
S[0..1] = AA
S[0..2] = AAA
S[0..3] = AAAA

Initially:

occ[A] = 0
occ[AA] = 0
occ[AAA] = 0
occ[AAAA] = 0



Step1: LPS Array of the following string denotes the length of longest prefix which is also a suffix:

LPS[1] denotes in string AA, A is a suffix and also a prefix as LPS[1] = 1
LPS[2] denotes in string AAA, AA is a suffix and also a prefix as LPS[2] = 2
LPS[3] denotes in string AAAA, AAA is a suffix and also a prefix as LPS[3] = 3

Step 2:Add these occurrences of prefixes as suffixes to the answer in the occ[] array:

Values : Counted substrings
occ[A] = 1 : S[1]
occ[AA] = 1 : S[1..2]
occ[AAA] = 1 : S[1..3]
occ[AAAA] = 0 : NULL(as there is not a prefix “AAAA” which is also a suffix.

Step 3: Now traverse the string in reverse order starting from “AAA” (as the last value will always be 0, since complete string is not a proper prefix).

Since, string “AAA” S[1..3] contains “AA” S[2..3] as well, which was not counted yet, therefore increment the occurrence of string “AA” in occ[“AA”] as occ[“AA”] += occ[“AAA”]. Below is the count for the same:
Values : Counted substrings
occ[A] = 1 : S[1]
occ[AA] = 2 : S[1..2], S[2..3]
occ[AAA] = 1 : S[1..3]
occ[AAAA] = 0 : NULL

Now string “AA” contains “A” as well, which was not counted yet, therefore increment the occurrence of string “A” in occ[“A”] as occ[“A”] += occ[“AA”]. Below is the count for the same:

Values : Counted substrings
occ[A] = 3 : S[1], S[2], S[3]
occ[AA] = 2 : S[1..2], S[2..3]
occ[AAA] = 1 : S[1..3]
occ[AAAA] = 0 : NULL

Step 4: At last add one to all occurences for the orignal prefixes, which are not counted yet.

Values : Counted substrings
occ[A] = 4 : S[1], S[2], S[3], S[0]
occ[AA] = 3 : S[1..2], S[2..3], S[0..1]
occ[AAA] = 2 : S[1..3], S[0..2]
occ[AAAA] = 1 : S[0..3]

Below is the implementation of the above approach:

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ program for the above approach
#include <bits/stdc++.h>
using namespace std;
  
// Function to print the count of all
// prefix in the given string
void print(vector<int>& occ, string& s)
{
    // Iterate over string s
    for (int i = 1; i <= int(s.size());
         i++) {
  
        // Print the prefix and their
        // frequency
        cout << s.substr(0, i)
             << " occurs "
             << occ[i]
             << " times."
             << endl;
    }
}
  
// Function to implement the LPS
// array to store the longest prefix
// which is also a suffix for every
// substring of the string S
vector<int> prefix_function(string& s)
{
    // Array to store LPS values
    vector<int> LPS(s.size());
  
    // Value of lps[0] is 0
    // by definition
    LPS[0] = 0;
  
    // Find the values of LPS[i] for
    // the rest of the string using
    // two pointers and DP
    for (int i = 1;
         i < int(s.size());
         i++) {
  
        // Initially set the value
        // of j as the longest
        // prefix that is also a
        // suffix for i as LPS[i-1]
        int j = LPS[i - 1];
  
        // Check if the suffix of
        // length j+1 is also a prefix
        while (j > 0 && s[i] != s[j]) {
            j = LPS[j - 1];
        }
  
        // If s[i] = s[j] then, assign
        // LPS[i] as j+1
        if (s[i] == s[j]) {
            LPS[i] = j + 1;
        }
  
        // If we reached j = 0, assign
        // LPS[i] as 0 as there was no
        // prefix equal to suffix
        else {
            LPS[i] = 0;
        }
    }
  
    // Return the calculated
    // LPS array
    return LPS;
}
  
// Function to count the occurrence
// of all the prefix in the string S
void count_occurence(string& s)
{
    int n = s.size();
  
    // Call the prefix_function
    // to get LPS
    vector<int> LPS
        = prefix_function(s);
  
    // To store the occurrence of
    // all the prefix
    vector<int> occ(n + 1);
  
    // Count all the suffixes that
    // are also prefix
    for (int i = 0; i < n; i++) {
        occ[LPS[i]]++;
    }
  
    // Add the occurences of
    // i to smaller prefixes
    for (int i = n - 1;
         i > 0; i--) {
        occ[LPS[i - 1]] += occ[i];
    }
  
    // Adding 1 to all occ[i] for all
    // the orignal prefix
    for (int i = 0; i <= n; i++)
        occ[i]++;
  
    // Function Call to print the
    // occurence of all the prefix
    print(occ, s);
}
  
// Driver Code
int main()
{
    // Given String
    string A = "ABACABA";
  
    // Function Call
    count_occurence(A);
    return 0;
}

chevron_right


Output:

A occurs 4 times.
AB occurs 2 times.
ABA occurs 2 times.
ABAC occurs 1 times.
ABACA occurs 1 times.
ABACAB occurs 1 times.
ABACABA occurs 1 times.

Time Complexity: O(N2)
Auxillary Space: O(N)

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.