Rabin-Karp Algorithm for Pattern Searching
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat, char txt) that prints all occurrences of pat in txt. You may assume that n > m.
Input: txt = "THIS IS A TEST TEXT" pat = "TEST" Output: Pattern found at index 10 Input: txt = "AABAACAADAABAABA" pat = "AABA" Output: Pattern found at index 0 Pattern found at index 9 Pattern found at index 12
The Naive String Matching algorithm slides the pattern one by one. After each slide, it one by one checks characters at the current shift and if all characters match then prints the match.
Like the Naive Algorithm, Rabin-Karp algorithm also slides the pattern one by one. But unlike the Naive algorithm, Rabin Karp algorithm matches the hash value of the pattern with the hash value of current substring of text, and if the hash values match then only it starts matching individual characters. So Rabin Karp algorithm needs to calculate hash values for following strings.
1) Pattern itself.
2) All the substrings of the text of length m.
Since we need to efficiently calculate hash values for all the substrings of size m of text, we must have a hash function which has the following property.
Hash at the next shift must be efficiently computable from the current hash value and next character in text or we can say hash(txt[s+1 .. s+m]) must be efficiently computable from hash(txt[s .. s+m-1]) and txt[s+m] i.e., hash(txt[s+1 .. s+m])= rehash(txt[s+m], hash(txt[s .. s+m-1])) and rehash must be O(1) operation.
The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a string is the numeric value of a string.