Open In App

Real time optimized KMP Algorithm for Pattern Searching

In the article, we have already discussed the KMP algorithm for pattern searching. In this article, a real-time optimized KMP algorithm is discussed. From the previous article, it is known that KMP(a.k.a. Knuth-Morris-Pratt) algorithm preprocesses the pattern P and constructs a failure function F(also called as lps[]) to store the length of the longest suffix of the sub-pattern P[1..l], which is also a prefix of P, for l = 0 to m-1. Note that the sub-pattern starts at index 1 because a suffix can be the string itself. After a mismatched occurred at index P[j], we update j to F[j-1]. The original KMP Algorithm has the runtime complexity of O(M + N) and auxiliary space O(M), where N is the size of the input text and M is the size of the pattern. Preprocessing step costs O(M) time. It is hard to achieve runtime complexity better than that but we are still able to eliminate some inefficient shifts. Inefficiencies of the original KMP algorithm: Consider the following case by using the original KMP algorithm:

Input: T = “cabababcababaca”, P = “ababaca” Output: Found at index 8



The longest proper prefix or lps[] for the above test case is {0, 0, 1, 2, 3, 0, 1}. Lets assume that the red color represents a mismatch occurs, green color represents the checking we skipped. Therefore, the searching process according to the original KMP algorithm occurs as follows: One thing which can be noticed is that in the third, fourth, and fifth matching, the mismatch occurs at the same location, T[7]. If we can skip the fourth and fifth matching, then the original KMP algorithm can further be optimised to answer the real-time queries. Real-time Optimization: The term real-time in this case can be interpreted as checking each character in the text T at most once. Our goal in this case is to shift the pattern properly (just like KMP algorithm does), but no need to check the mismatched character again. That is, for the same above example, the optimized KMP algorithm should work in the following way:

Approach: One way to achieve the goal is to modify the preprocessing process.



Constructing Failure Table:

If P[F[l]] is t,
  if yes:
    FT[t][l] <- F[l] + 1;
  if no: 
    check if F[l] is 0,
      if yes:
        FT[t][l] <- 0;
      if no:
        FT[t][l] <- FT[t][F[t] - 1];

The space complexity is also O(n) because the failure table takes up O(n) space, and the failure function takes up O(n) space as well.


Article Tags :