Given two strings txt and pat of size N and M, where N > M. String txt and pat represent the text and pattern respectively. The task is to print all indexes of occurrences of pattern string in the text string. Use one-based indexing while returning the indices.
Examples:
Input: txt = “THIS IS A TEST TEXT”, pat = “TEST”
Output: Pattern found at index 10
Input: txt= “AABAACAADAABAABA”
pat = “AABA”
Output: Pattern found at index 0, Pattern found at index 9, Pattern found at index 12

Arrivals of pattern in the text
We have discussed the Naive pattern-searching algorithm in the previous post. The worst case complexity of the Naive algorithm is O(m(n-m+1)). The time complexity of the KMP algorithm is O(n+m) in the worst case.
KMP (Knuth Morris Pratt) Pattern Searching:
The Naive pattern-searching algorithm doesn’t work well in cases where we see many matching characters followed by a mismatching character.
Examples:
1) txt[] = “AAAAAAAAAAAAAAAAAB”, pat[] = “AAAAB”
2) txt[] = “ABABABCABABABCABABABC”, pat[] = “ABABAC” (not a worst case, but a bad case for Naive)
The KMP matching algorithm uses degenerating property (pattern having the same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst-case complexity to O(n+m).
The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of the next window. We take advantage of this information to avoid matching the characters that we know will anyway match.
Matching Overview
txt = “AAAAABAAABA”
pat = “AAAA”
We compare first window of txt with pat
txt = “AAAAABAAABA”
pat = “AAAA” [Initial position]
We find a match. This is same as Naive String Matching.
In the next step, we compare next window of txt with pat.
txt = “AAAAABAAABA”
pat = “AAAA” [Pattern shifted one position]
This is where KMP does optimization over Naive. In this second window, we only compare fourth A of pattern
with fourth character of current window of text to decide whether current window matches or not. Since we know
first three characters will anyway match, we skipped matching first three characters.
Need of Preprocessing?
An important question arises from the above explanation, how to know how many characters to be skipped. To know this,
we pre-process pattern and prepare an integer array lps[] that tells us the count of characters to be skipped
Preprocessing Overview:
- KMP algorithm preprocesses pat[] and constructs an auxiliary lps[] of size m (same as the size of the pattern) which is used to skip characters while matching.
- Name lps indicates the longest proper prefix which is also a suffix. A proper prefix is a prefix with a whole string not allowed. For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A” and “AB”. Suffixes of the string are “”, “C”, “BC”, and “ABC”.
- We search for lps in subpatterns. More clearly we focus on sub-strings of patterns that are both prefix and suffix.
- For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores the length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i].
lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i].
Note: lps[i] could also be defined as the longest prefix which is also a proper suffix. We need to use it properly in one place to make sure that the whole substring is not considered.
Examples of lps[] construction:
For the pattern “AAAA”, lps[] is [0, 1, 2, 3]
For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]
For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]
For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]
For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3]
Preprocessing Algorithm:
In the preprocessing part,
- We calculate values in lps[]. To do that, we keep track of the length of the longest prefix suffix value (we use len variable for this purpose) for the previous index
- We initialize lps[0] and len as 0.
- If pat[len] and pat[i] match, we increment len by 1 and assign the incremented value to lps[i].
- If pat[i] and pat[len] do not match and len is not 0, we update len to lps[len-1]
- See computeLPSArray() in the below code for details
Illustration of preprocessing (or construction of lps[]):
pat[] = “AAACAAAA”
=> len = 0, i = 0:
- lps[0] is always 0, we move to i = 1
=> len = 0, i = 1:
- Since pat[len] and pat[i] match, do len++,
- store it in lps[i] and do i++.
- Set len = 1, lps[1] = 1, i = 2
=> len = 1, i = 2:
- Since pat[len] and pat[i] match, do len++,
- store it in lps[i] and do i++.
- Set len = 2, lps[2] = 2, i = 3
=> len = 2, i = 3:
- Since pat[len] and pat[i] do not match, and len > 0,
- Set len = lps[len-1] = lps[1] = 1
=> len = 1, i = 3:
- Since pat[len] and pat[i] do not match and len > 0,
- len = lps[len-1] = lps[0] = 0
=> len = 0, i = 3:
- Since pat[len] and pat[i] do not match and len = 0,
- Set lps[3] = 0 and i = 4
=> len = 0, i = 4:
- Since pat[len] and pat[i] match, do len++,
- Store it in lps[i] and do i++.
- Set len = 1, lps[4] = 1, i = 5
=> len = 1, i = 5:
- Since pat[len] and pat[i] match, do len++,
- Store it in lps[i] and do i++.
- Set len = 2, lps[5] = 2, i = 6
=> len = 2, i = 6:
- Since pat[len] and pat[i] match, do len++,
- Store it in lps[i] and do i++.
- len = 3, lps[6] = 3, i = 7
=> len = 3, i = 7:
- Since pat[len] and pat[i] do not match and len > 0,
- Set len = lps[len-1] = lps[2] = 2
=> len = 2, i = 7:
- Since pat[len] and pat[i] match, do len++,
- Store it in lps[i] and do i++.
- len = 3, lps[7] = 3, i = 8
We stop here as we have constructed the whole lps[].
Implementation of KMP algorithm:
Unlike the Naive algorithm, where we slide the pattern by one and compare all characters at each shift, we use a value from lps[] to decide the next characters to be matched. The idea is to not match a character that we know will anyway match.
How to use lps[] to decide the next positions (or to know the number of characters to be skipped)?
- We start the comparison of pat[j] with j = 0 with characters of the current window of text.
- We keep matching characters txt[i] and pat[j] and keep incrementing i and j while pat[j] and txt[i] keep matching.
- When we see a mismatch
- We know that characters pat[0..j-1] match with txt[i-j…i-1] (Note that j starts with 0 and increments it only when there is a match).
- We also know (from the above definition) that lps[j-1] is the count of characters of pat[0…j-1] that are both proper prefix and suffix.
- From the above two points, we can conclude that we do not need to match these lps[j-1] characters with txt[i-j…i-1] because we know that these characters will anyway match. Let us consider the above example to understand this.
Below is the illustration of the above algorithm:
Consider txt[] = “AAAAABAAABA“, pat[] = “AAAA“
If we follow the above LPS building process then lps[] = {0, 1, 2, 3}
-> i = 0, j = 0: txt[i] and pat[j] match, do i++, j++
-> i = 1, j = 1: txt[i] and pat[j] match, do i++, j++
-> i = 2, j = 2: txt[i] and pat[j] match, do i++, j++
-> i = 3, j = 3: txt[i] and pat[j] match, do i++, j++
-> i = 4, j = 4: Since j = M, print pattern found and reset j, j = lps[j-1] = lps[3] = 3
Here unlike Naive algorithm, we do not match first three
characters of this window. Value of lps[j-1] (in above step) gave us index of next character to match.
-> i = 4, j = 3: txt[i] and pat[j] match, do i++, j++
-> i = 5, j = 4: Since j == M, print pattern found and reset j, j = lps[j-1] = lps[3] = 3
Again unlike Naive algorithm, we do not match first three characters of this window. Value of lps[j-1] (in above step) gave us index of next character to match.
-> i = 5, j = 3: txt[i] and pat[j] do NOT match and j > 0, change only j. j = lps[j-1] = lps[2] = 2
-> i = 5, j = 2: txt[i] and pat[j] do NOT match and j > 0, change only j. j = lps[j-1] = lps[1] = 1
-> i = 5, j = 1: txt[i] and pat[j] do NOT match and j > 0, change only j. j = lps[j-1] = lps[0] = 0
-> i = 5, j = 0: txt[i] and pat[j] do NOT match and j is 0, we do i++.
-> i = 6, j = 0: txt[i] and pat[j] match, do i++ and j++
-> i = 7, j = 1: txt[i] and pat[j] match, do i++ and j++
We continue this way till there are sufficient characters in the text to be compared with the characters in the pattern…
Below is the implementation of the above approach:
C++
#include <bits/stdc++.h>
using namespace std;
// Fills lps[] for given pattern pat
void computeLPSArray(string& pat, int M, vector<int>& lps)
{
// Length of the previous longest prefix suffix
int len = 0;
// lps[0] is always 0
lps[0] = 0;
// loop calculates lps[i] for i = 1 to M-1
int i = 1;
while (i < M) {
if (pat[i] == pat[len]) {
len++;
lps[i] = len;
i++;
}
else // (pat[i] != pat[len])
{
if (len != 0) {
len = lps[len - 1];
}
else // if (len == 0)
{
lps[i] = 0;
i++;
}
}
}
}
// Prints occurrences of pat in txt
vector<int> KMPSearch(string& pat, string& txt)
{
int M = pat.length();
int N = txt.length();
// Create lps[] that will hold the longest prefix suffix
// values for pattern
vector<int> lps(M);
vector<int> result;
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt
int j = 0; // index for pat
while ((N - i) >= (M - j)) {
if (pat[j] == txt[i]) {
j++;
i++;
}
if (j == M) {
result.push_back(i - j + 1);
j = lps[j - 1];
}
// Mismatch after j matches
else if (i < N && pat[j] != txt[i]) {
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if (j != 0)
j = lps[j - 1];
else
i = i + 1;
}
}
return result;
}
int main()
{
string txt = "geeksforgeeks";
string pat = "geeks";
vector<int> result = KMPSearch(pat, txt);
// Print all the occurance (1-based indices)
for (int i = 0; i < result.size(); i++) {
cout << result[i] << " ";
}
return 0;
}
C
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Fills lps[] for given pattern pat
void computeLPSArray(const char* pat, int M, int* lps)
{
// Length of the previous longest prefix suffix
int len = 0;
// lps[0] is always 0
lps[0] = 0;
// Loop calculates lps[i] for i = 1 to M-1
int i = 1;
while (i < M) {
if (pat[i] == pat[len]) {
len++;
lps[i] = len;
i++;
}
else {
if (len != 0) {
len = lps[len - 1];
}
else {
lps[i] = 0;
i++;
}
}
}
}
// Prints occurrences of pat in txt and returns an array of
// occurrences
int* KMPSearch(const char* pat, const char* txt, int* count)
{
int M = strlen(pat);
int N = strlen(txt);
// Create lps[] that will hold the longest prefix suffix
// values for pattern
int* lps = (int*)malloc(M * sizeof(int));
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int* result = (int*)malloc(N * sizeof(int));
// Number of occurrences found
*count = 0;
int i = 0; // index for txt
int j = 0; // index for pat
while ((N - i) >= (M - j)) {
if (pat[j] == txt[i]) {
j++;
i++;
}
if (j == M) {
// Record the occurrence (1-based index)
result[*count] = i - j + 1;
(*count)++;
j = lps[j - 1];
}
else if (i < N && pat[j] != txt[i]) {
if (j != 0) {
j = lps[j - 1];
}
else {
i = i + 1;
}
}
}
free(lps);
return result;
}
// Driver code
int main()
{
const char txt[] = "geeksforgeeks";
const char pat[] = "geeks";
int count;
// Call KMPSearch and get the array of occurrences
int* result = KMPSearch(pat, txt, &count);
// Print all the occurrences (1-based indices)
for (int i = 0; i < count; i++) {
printf("%d ", result[i]);
}
printf("\n");
// Free the allocated memory
free(result);
return 0;
}
Java
import java.util.ArrayList;
import java.util.List;
public class GFG {
// Fills lps[] for given pattern pat
static void computeLPSArray(String pat, int M, int[] lps)
{
// Length of the previous longest prefix suffix
int len = 0;
// lps[0] is always 0
lps[0] = 0;
// Loop calculates lps[i] for i = 1 to M-1
int i = 1;
while (i < M) {
if (pat.charAt(i) == pat.charAt(len)) {
len++;
lps[i] = len;
i++;
}
else {
if (len != 0) {
len = lps[len - 1];
}
else {
lps[i] = 0;
i++;
}
}
}
}
// Prints occurrences of pat in txt
static List<Integer> KMPSearch(String pat, String txt)
{
int M = pat.length();
int N = txt.length();
// Create lps[] that will hold the longest prefix
// suffix values for pattern
int[] lps = new int[M];
List<Integer> result = new ArrayList<>();
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt
int j = 0; // index for pat
while ((N - i) >= (M - j)) {
if (pat.charAt(j) == txt.charAt(i)) {
j++;
i++;
}
if (j == M) {
result.add(i - j + 1);
j = lps[j - 1];
}
else if (i < N
&& pat.charAt(j) != txt.charAt(i)) {
if (j != 0) {
j = lps[j - 1];
}
else {
i = i + 1;
}
}
}
return result;
}
// Driver code
public static void main(String[] args)
{
String txt = "geeksforgeeks";
String pat = "geeks";
List<Integer> result = KMPSearch(pat, txt);
// Print all the occurrences (1-based indices)
for (int index : result) {
System.out.print(index + " ");
}
}
}
Python
def computeLPSArray(pat):
M = len(pat)
lps = [0] * M
# Length of the previous longest prefix suffix
length = 0
i = 1
# Loop calculates lps[i] for i = 1 to M-1
while i < M:
if pat[i] == pat[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
def KMPSearch(pat, txt):
M = len(pat)
N = len(txt)
# Create lps[] that will hold the longest prefix
# suffix values for pattern
lps = computeLPSArray(pat)
result = []
i = 0 # index for txt
j = 0 # index for pat
while (N - i) >= (M - j):
if pat[j] == txt[i]:
j += 1
i += 1
if j == M:
result.append(i - j + 1)
j = lps[j - 1]
elif i < N and pat[j] != txt[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return result
# Driver code
txt = "geeksforgeeks"
pat = "geeks"
result = KMPSearch(pat, txt)
# Print all the occurrences (1-based indices)
for index in result:
print(index, end=' ')
C#
// C# program for implementation of KMP pattern
// searching algorithm
using System;
using System.Collections.Generic;
class KMP {
// Fills lps[] for given pattern pat
void computeLPSArray(string pat, int M, int[] lps) {
// Length of the previous longest prefix suffix
int len = 0;
// lps[0] is always 0
lps[0] = 0;
// Loop calculates lps[i] for i = 1 to M-1
int i = 1;
while (i < M) {
if (pat[i] == pat[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
}
// Prints occurrences of pat in txt
List<int> KMPSearch(string pat, string txt) {
int M = pat.Length;
int N = txt.Length;
// Create lps[] that will hold the longest
// prefix suffix values for pattern
int[] lps = new int[M];
List<int> result = new List<int>();
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt
int j = 0; // index for pat
while ((N - i) >= (M - j)) {
if (pat[j] == txt[i]) {
j++;
i++;
}
if (j == M) {
result.Add(i - j + 1);
j = lps[j - 1];
} else if (i < N && pat[j] != txt[i]) {
if (j != 0) {
j = lps[j - 1];
} else {
i = i + 1;
}
}
}
return result;
}
// Driver code
static void Main() {
string txt = "geeksforgeeks";
string pat = "geeks";
KMP kmp = new KMP();
List<int> result = kmp.KMPSearch(pat, txt);
// Print all the occurrences (1-based indices)
foreach (int index in result) {
Console.Write(index + " ");
}
}
}
JavaScript
// Fills lps[] for given pattern pat
function computeLPSArray(pat, M, lps) {
// Length of the previous longest prefix suffix
let len = 0;
// lps[0] is always 0
lps[0] = 0;
// Loop calculates lps[i] for i = 1 to M-1
let i = 1;
while (i < M) {
if (pat[i] === pat[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len !== 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
}
// Prints occurrences of pat in txt
function KMPSearch(pat, txt) {
const M = pat.length;
const N = txt.length;
// Create lps[] that will hold the longest prefix
// suffix values for pattern
const lps = new Array(M).fill(0);
const result = [];
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
let i = 0; // index for txt
let j = 0; // index for pat
while ((N - i) >= (M - j)) {
if (pat[j] === txt[i]) {
j++;
i++;
}
if (j === M) {
result.push(i - j + 1);
j = lps[j - 1];
} else if (i < N && pat[j] !== txt[i]) {
if (j !== 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
return result;
}
// Driver code
const txt = "geeksforgeeks";
const pat = "geeks";
const result = KMPSearch(pat, txt);
// Print all the occurrences (1-based indices)
for (const index of result) {
console.log(index);
}
Time Complexity: O(N+M) where N is the length of the text and M is the length of the pattern to be found.
Auxiliary Space: O(M)