# Boyer Moore Algorithm for Pattern Searching

Pattern searching is an important problem in computer science. When we do search for a string in notepad/word file or browser or database, pattern searching algorithms are used to show the search results. A typical problem statement would be-
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m.

Examples:

```Input:  txt[] = "THIS IS A TEST TEXT"
pat[] = "TEST"
Output: Pattern found at index 10

pat[] =  "AABA"
Output: Pattern found at index 0
Pattern found at index 9
Pattern found at index 12 ```

In this post, we will discuss Boyer Moore pattern searching algorithm. Like KMP and Finite Automata algorithms, Boyer Moore algorithm also preprocesses the pattern.
Boyer Moore is a combination of following two approaches.
2) Good Suffix Heuristic

Both of the above heuristics can also be used independently to search a pattern in a text. Let us first understand how two independent approaches work together in the Boyer Moore algorithm. If we take a look at the Naive algorithm, it slides the pattern over the text one by one. KMP algorithm does preprocessing over the pattern so that the pattern can be shifted by more than one. The Boyer Moore algorithm does preprocessing for the same reason. It processes the pattern and creates different arrays for both heuristics. At every step, it slides the pattern by the max of the slides suggested by the two heuristics. So it uses best of the two heuristics at every step.
Unlike the previous pattern searching algorithms, Boyer Moore algorithm starts matching from the last character of the pattern.

In this post, we will discuss bad character heuristic, and discuss Good Suffix heuristic in the next post.

The idea of bad character heuristic is simple. The character of the text which doesn’t match with the current character of the pattern is called the Bad Character. Upon mismatch, we shift the pattern until –
1) The mismatch becomes a match
2) Pattern P move past the mismatched character.

Case 1 – Mismatch become match
We will lookup the position of last occurrence of mismatching character in pattern and if mismatching character exist in pattern then we’ll shift the pattern such that it get aligned to the mismatching character in text T. case 1

Explanation: In the above example, we got a mismatch at position 3. Here our mismatching character is “A”. Now we will search for last occurrence of “A” in pattern. We got “A” at position 1 in pattern (displayed in Blue) and this is the last occurrence of it. Now we will shift pattern 2 times so that “A” in pattern get aligned with “A” in text.

Case 2 – Pattern move past the mismatch character
We’ll lookup the position of last occurrence of mismatching character in pattern and if character does not exist we will shift pattern past the mismatching character. case2

Explanation: Here we have a mismatch at position 7. The mismatching character “C” does not exist in pattern before position 7 so we’ll shift pattern past to the position 7 and eventually in above example we have got a perfect match of pattern (displayed in Green). We are doing this because, “C” do not exist in pattern so at every shift before position 7 we will get mismatch and our search will be fruitless.

In the following implementation, we preprocess the pattern and store the last occurrence of every possible character in an array of size equal to alphabet size. If the character is not present at all, then it may result in a shift by m (length of pattern). Therefore, the bad character heuristic takes time in the best case.

## C++

 `/* C++ Program for Bad Character Heuristic of Boyer  ` `Moore String Matching Algorithm */` `#include ` `using` `namespace` `std; ` `# define NO_OF_CHARS 256  ` ` `  `// The preprocessing function for Boyer Moore's  ` `// bad character heuristic  ` `void` `badCharHeuristic( string str, ``int` `size,  ` `                        ``int` `badchar[NO_OF_CHARS])  ` `{  ` `    ``int` `i;  ` ` `  `    ``// Initialize all occurrences as -1  ` `    ``for` `(i = 0; i < NO_OF_CHARS; i++)  ` `        ``badchar[i] = -1;  ` ` `  `    ``// Fill the actual value of last occurrence  ` `    ``// of a character  ` `    ``for` `(i = 0; i < size; i++)  ` `        ``badchar[(``int``) str[i]] = i;  ` `}  ` ` `  `/* A pattern searching function that uses Bad  ` `Character Heuristic of Boyer Moore Algorithm */` `void` `search( string txt, string pat)  ` `{  ` `    ``int` `m = pat.size();  ` `    ``int` `n = txt.size();  ` ` `  `    ``int` `badchar[NO_OF_CHARS];  ` ` `  `    ``/* Fill the bad character array by calling  ` `    ``the preprocessing function badCharHeuristic()  ` `    ``for given pattern */` `    ``badCharHeuristic(pat, m, badchar);  ` ` `  `    ``int` `s = 0; ``// s is shift of the pattern with  ` `                ``// respect to text  ` `    ``while``(s <= (n - m))  ` `    ``{  ` `        ``int` `j = m - 1;  ` ` `  `        ``/* Keep reducing index j of pattern while  ` `        ``characters of pattern and text are  ` `        ``matching at this shift s */` `        ``while``(j >= 0 && pat[j] == txt[s + j])  ` `            ``j--;  ` ` `  `        ``/* If the pattern is present at current  ` `        ``shift, then index j will become -1 after  ` `        ``the above loop */` `        ``if` `(j < 0)  ` `        ``{  ` `            ``cout << ``"pattern occurs at shift = "` `<<  s << endl;  ` ` `  `            ``/* Shift the pattern so that the next  ` `            ``character in text aligns with the last  ` `            ``occurrence of it in pattern.  ` `            ``The condition s+m < n is necessary for  ` `            ``the case when pattern occurs at the end  ` `            ``of text */` `            ``s += (s + m < n)? m-badchar[txt[s + m]] : 1;  ` ` `  `        ``}  ` ` `  `        ``else` `            ``/* Shift the pattern so that the bad character  ` `            ``in text aligns with the last occurrence of  ` `            ``it in pattern. The max function is used to  ` `            ``make sure that we get a positive shift.  ` `            ``We may get a negative shift if the last  ` `            ``occurrence of bad character in pattern  ` `            ``is on the right side of the current  ` `            ``character. */` `            ``s += max(1, j - badchar[txt[s + j]]);  ` `    ``}  ` `}  ` ` `  `/* Driver code */` `int` `main()  ` `{  ` `    ``string txt= ``"ABAAABCD"``;  ` `    ``string pat = ``"ABC"``;  ` `    ``search(txt, pat);  ` `    ``return` `0;  ` `}  ` `  `  ` ``// This code is contributed by rathbhupendra `

## C

 `/* C Program for Bad Character Heuristic of Boyer  ` `   ``Moore String Matching Algorithm */` `# include ` `# include ` `# include ` ` `  `# define NO_OF_CHARS 256 ` ` `  `// A utility function to get maximum of two integers ` `int` `max (``int` `a, ``int` `b) { ``return` `(a > b)? a: b; } ` ` `  `// The preprocessing function for Boyer Moore's ` `// bad character heuristic ` `void` `badCharHeuristic( ``char` `*str, ``int` `size,  ` `                        ``int` `badchar[NO_OF_CHARS]) ` `{ ` `    ``int` `i; ` ` `  `    ``// Initialize all occurrences as -1 ` `    ``for` `(i = 0; i < NO_OF_CHARS; i++) ` `         ``badchar[i] = -1; ` ` `  `    ``// Fill the actual value of last occurrence  ` `    ``// of a character ` `    ``for` `(i = 0; i < size; i++) ` `         ``badchar[(``int``) str[i]] = i; ` `} ` ` `  `/* A pattern searching function that uses Bad ` `   ``Character Heuristic of Boyer Moore Algorithm */` `void` `search( ``char` `*txt,  ``char` `*pat) ` `{ ` `    ``int` `m = ``strlen``(pat); ` `    ``int` `n = ``strlen``(txt); ` ` `  `    ``int` `badchar[NO_OF_CHARS]; ` ` `  `    ``/* Fill the bad character array by calling  ` `       ``the preprocessing function badCharHeuristic()  ` `       ``for given pattern */` `    ``badCharHeuristic(pat, m, badchar); ` ` `  `    ``int` `s = 0;  ``// s is shift of the pattern with  ` `                ``// respect to text ` `    ``while``(s <= (n - m)) ` `    ``{ ` `        ``int` `j = m-1; ` ` `  `        ``/* Keep reducing index j of pattern while  ` `           ``characters of pattern and text are  ` `           ``matching at this shift s */` `        ``while``(j >= 0 && pat[j] == txt[s+j]) ` `            ``j--; ` ` `  `        ``/* If the pattern is present at current ` `           ``shift, then index j will become -1 after ` `           ``the above loop */` `        ``if` `(j < 0) ` `        ``{ ` `            ``printf``(``"\n pattern occurs at shift = %d"``, s); ` ` `  `            ``/* Shift the pattern so that the next  ` `               ``character in text aligns with the last  ` `               ``occurrence of it in pattern. ` `               ``The condition s+m < n is necessary for  ` `               ``the case when pattern occurs at the end  ` `               ``of text */` `            ``s += (s+m < n)? m-badchar[txt[s+m]] : 1; ` ` `  `        ``} ` ` `  `        ``else` `            ``/* Shift the pattern so that the bad character ` `               ``in text aligns with the last occurrence of ` `               ``it in pattern. The max function is used to ` `               ``make sure that we get a positive shift.  ` `               ``We may get a negative shift if the last  ` `               ``occurrence  of bad character in pattern ` `               ``is on the right side of the current  ` `               ``character. */` `            ``s += max(1, j - badchar[txt[s+j]]); ` `    ``} ` `} ` ` `  `/* Driver program to test above function */` `int` `main() ` `{ ` `    ``char` `txt[] = ``"ABAAABCD"``; ` `    ``char` `pat[] = ``"ABC"``; ` `    ``search(txt, pat); ` `    ``return` `0; ` `} `

## Java

 `/* Java Program for Bad Character Heuristic of Boyer  ` `Moore String Matching Algorithm */` ` `  ` `  `class` `AWQ{ ` `     `  `     ``static` `int` `NO_OF_CHARS = ``256``; ` `      `  `    ``//A utility function to get maximum of two integers ` `     ``static` `int` `max (``int` `a, ``int` `b) { ``return` `(a > b)? a: b; } ` ` `  `     ``//The preprocessing function for Boyer Moore's ` `     ``//bad character heuristic ` `     ``static` `void` `badCharHeuristic( ``char` `[]str, ``int` `size,``int` `badchar[]) ` `     ``{ ` `      ``int` `i; ` ` `  `      ``// Initialize all occurrences as -1 ` `      ``for` `(i = ``0``; i < NO_OF_CHARS; i++) ` `           ``badchar[i] = -``1``; ` ` `  `      ``// Fill the actual value of last occurrence  ` `      ``// of a character ` `      ``for` `(i = ``0``; i < size; i++) ` `           ``badchar[(``int``) str[i]] = i; ` `     ``} ` ` `  `     ``/* A pattern searching function that uses Bad ` `     ``Character Heuristic of Boyer Moore Algorithm */` `     ``static` `void` `search( ``char` `txt[],  ``char` `pat[]) ` `     ``{ ` `      ``int` `m = pat.length; ` `      ``int` `n = txt.length; ` ` `  `      ``int` `badchar[] = ``new` `int``[NO_OF_CHARS]; ` ` `  `      ``/* Fill the bad character array by calling  ` `         ``the preprocessing function badCharHeuristic()  ` `         ``for given pattern */` `      ``badCharHeuristic(pat, m, badchar); ` ` `  `      ``int` `s = ``0``;  ``// s is shift of the pattern with  ` `                  ``// respect to text ` `      ``while``(s <= (n - m)) ` `      ``{ ` `          ``int` `j = m-``1``; ` ` `  `          ``/* Keep reducing index j of pattern while  ` `             ``characters of pattern and text are  ` `             ``matching at this shift s */` `          ``while``(j >= ``0` `&& pat[j] == txt[s+j]) ` `              ``j--; ` ` `  `          ``/* If the pattern is present at current ` `             ``shift, then index j will become -1 after ` `             ``the above loop */` `          ``if` `(j < ``0``) ` `          ``{ ` `              ``System.out.println(``"Patterns occur at shift = "` `+ s); ` ` `  `              ``/* Shift the pattern so that the next  ` `                 ``character in text aligns with the last  ` `                 ``occurrence of it in pattern. ` `                 ``The condition s+m < n is necessary for  ` `                 ``the case when pattern occurs at the end  ` `                 ``of text */` `              ``s += (s+m < n)? m-badchar[txt[s+m]] : ``1``; ` ` `  `          ``} ` ` `  `          ``else` `              ``/* Shift the pattern so that the bad character ` `                 ``in text aligns with the last occurrence of ` `                 ``it in pattern. The max function is used to ` `                 ``make sure that we get a positive shift.  ` `                 ``We may get a negative shift if the last  ` `                 ``occurrence  of bad character in pattern ` `                 ``is on the right side of the current  ` `                 ``character. */` `              ``s += max(``1``, j - badchar[txt[s+j]]); ` `      ``} ` `     ``} ` ` `  `     ``/* Driver program to test above function */` `    ``public` `static` `void` `main(String []args) { ` `         `  `         ``char` `txt[] = ``"ABAAABCD"``.toCharArray(); ` `         ``char` `pat[] = ``"ABC"``.toCharArray(); ` `         ``search(txt, pat); ` `    ``} ` `}  `

## Python

 `# Python3 Program for Bad Character Heuristic ` `# of Boyer Moore String Matching Algorithm  ` ` `  `NO_OF_CHARS ``=` `256` ` `  `def` `badCharHeuristic(string, size): ` `    ``''' ` `    ``The preprocessing function for ` `    ``Boyer Moore's bad character heuristic ` `    ``'''` ` `  `    ``# Initialize all occurrence as -1 ` `    ``badChar ``=` `[``-``1``]``*``NO_OF_CHARS ` ` `  `    ``# Fill the actual value of last occurrence ` `    ``for` `i ``in` `range``(size): ` `        ``badChar[``ord``(string[i])] ``=` `i; ` ` `  `    ``# retun initialized list ` `    ``return` `badChar ` ` `  `def` `search(txt, pat): ` `    ``''' ` `    ``A pattern searching function that uses Bad Character ` `    ``Heuristic of Boyer Moore Algorithm ` `    ``'''` `    ``m ``=` `len``(pat) ` `    ``n ``=` `len``(txt) ` ` `  `    ``# create the bad character list by calling  ` `    ``# the preprocessing function badCharHeuristic() ` `    ``# for given pattern ` `    ``badChar ``=` `badCharHeuristic(pat, m)  ` ` `  `    ``# s is shift of the pattern with respect to text ` `    ``s ``=` `0` `    ``while``(s <``=` `n``-``m): ` `        ``j ``=` `m``-``1` ` `  `        ``# Keep reducing index j of pattern while  ` `        ``# characters of pattern and text are matching ` `        ``# at this shift s ` `        ``while` `j>``=``0` `and` `pat[j] ``=``=` `txt[s``+``j]: ` `            ``j ``-``=` `1` ` `  `        ``# If the pattern is present at current shift,  ` `        ``# then index j will become -1 after the above loop ` `        ``if` `j<``0``: ` `            ``print``(``"Pattern occur at shift = {}"``.``format``(s)) ` ` `  `            ``'''     ` `                ``Shift the pattern so that the next character in text ` `                      ``aligns with the last occurrence of it in pattern. ` `                ``The condition s+m < n is necessary for the case when ` `                   ``pattern occurs at the end of text ` `               ``'''` `            ``s ``+``=` `(m``-``badChar[``ord``(txt[s``+``m])] ``if` `s``+``m

## C#

 `/* C# Program for Bad Character Heuristic of Boyer  ` `Moore String Matching Algorithm */` ` `  `using` `System; ` `public` `class` `AWQ{  ` `     `  `    ``static` `int` `NO_OF_CHARS = 256;  ` `     `  `    ``//A utility function to get maximum of two integers  ` `    ``static` `int` `max (``int` `a, ``int` `b) { ``return` `(a > b)? a: b; }  ` ` `  `    ``//The preprocessing function for Boyer Moore's  ` `    ``//bad character heuristic  ` `    ``static` `void` `badCharHeuristic( ``char` `[]str, ``int` `size,``int` `[]badchar)  ` `    ``{  ` `    ``int` `i;  ` ` `  `    ``// Initialize all occurrences as -1  ` `    ``for` `(i = 0; i < NO_OF_CHARS; i++)  ` `        ``badchar[i] = -1;  ` ` `  `    ``// Fill the actual value of last occurrence  ` `    ``// of a character  ` `    ``for` `(i = 0; i < size; i++)  ` `        ``badchar[(``int``) str[i]] = i;  ` `    ``}  ` ` `  `    ``/* A pattern searching function that uses Bad  ` `    ``Character Heuristic of Boyer Moore Algorithm */` `    ``static` `void` `search( ``char` `[]txt, ``char` `[]pat)  ` `    ``{  ` `    ``int` `m = pat.Length;  ` `    ``int` `n = txt.Length;  ` ` `  `    ``int` `[]badchar = ``new` `int``[NO_OF_CHARS];  ` ` `  `    ``/* Fill the bad character array by calling  ` `        ``the preprocessing function badCharHeuristic()  ` `        ``for given pattern */` `    ``badCharHeuristic(pat, m, badchar);  ` ` `  `    ``int` `s = 0; ``// s is shift of the pattern with  ` `                ``// respect to text  ` `    ``while``(s <= (n - m))  ` `    ``{  ` `        ``int` `j = m-1;  ` ` `  `        ``/* Keep reducing index j of pattern while  ` `            ``characters of pattern and text are  ` `            ``matching at this shift s */` `        ``while``(j >= 0 && pat[j] == txt[s+j])  ` `            ``j--;  ` ` `  `        ``/* If the pattern is present at current  ` `            ``shift, then index j will become -1 after  ` `            ``the above loop */` `        ``if` `(j < 0)  ` `        ``{  ` `            ``Console.WriteLine(``"Patterns occur at shift = "` `+ s);  ` ` `  `            ``/* Shift the pattern so that the next  ` `                ``character in text aligns with the last  ` `                ``occurrence of it in pattern.  ` `                ``The condition s+m < n is necessary for  ` `                ``the case when pattern occurs at the end  ` `                ``of text */` `            ``s += (s+m < n)? m-badchar[txt[s+m]] : 1;  ` ` `  `        ``}  ` ` `  `        ``else` `            ``/* Shift the pattern so that the bad character  ` `                ``in text aligns with the last occurrence of  ` `                ``it in pattern. The max function is used to  ` `                ``make sure that we get a positive shift.  ` `                ``We may get a negative shift if the last  ` `                ``occurrence of bad character in pattern  ` `                ``is on the right side of the current  ` `                ``character. */` `            ``s += max(1, j - badchar[txt[s+j]]);  ` `    ``}  ` `    ``}  ` ` `  `    ``/* Driver program to test above function */` `    ``public` `static` `void` `Main() {  ` `         `  `        ``char` `[]txt = ``"ABAAABCD"``.ToCharArray();  ` `        ``char` `[]pat = ``"ABC"``.ToCharArray();  ` `        ``search(txt, pat);  ` `    ``}  ` `}  ` ` `  `// This code is contributed by PrinciRaj19992 `

Output:

``` pattern occurs at shift = 4
```

The Bad Character Heuristic may take time in worst case. The worst case occurs when all characters of the text and pattern are same. For example, txt[] = “AAAAAAAAAAAAAAAAAA” and pat[] = “AAAAA”.

Boyer Moore Algorithm | Good Suffix heuristic