Suffix Tree Application 1 – Substring Check
Given a text string and a pattern string, check if a pattern exists in text or not.
Few pattern searching algorithms (KMP, Rabin-Karp, Naive Algorithm, Finite Automata) are already discussed, which can be used for this check.
Here we will discuss suffix tree based algorithm.
As a prerequisite, we must know how to build a suffix tree in one or the other way.
Once we have a suffix tree built for given text, we need to traverse the tree from root to leaf against the characters in pattern. If we do not fall off the tree (i.e. there is a path from root to leaf or somewhere in middle) while traversal, then pattern exists in text as a substring.
Here we will build suffix tree using Ukkonen’s Algorithm, discussed already as below:
Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2
Ukkonen’s Suffix Tree Construction – Part 3
Ukkonen’s Suffix Tree Construction – Part 4
Ukkonen’s Suffix Tree Construction – Part 5
Ukkonen’s Suffix Tree Construction – Part 6
The core traversal implementation for substring check, can be modified accordingly for suffix trees built by other algorithms.
Pattern <TEST> is a Substring Pattern <A> is a Substring Pattern < > is a Substring Pattern <IS A> is a Substring Pattern < IS A > is a Substring Pattern <TEST1> is NOT a Substring Pattern <THIS IS GOOD> is NOT a Substring Pattern <TES> is a Substring Pattern <TESA> is NOT a Substring Pattern <ISB> is NOT a Substring
Ukkonen’s Suffix Tree Construction takes O(N) time and space to build suffix tree for a string of length N and after that, traversal for substring check takes O(M) for a pattern of length M.
With a slight modification in the traversal algorithm discussed here, we can answer the following:
- Find all occurrences of a given pattern P present in text T.
- How to check if a pattern is prefix of a text?
- How to check if a pattern is suffix of a text?
We have published following more articles on suffix tree applications:
- Suffix Tree Application 2 – Searching All Patterns
- Suffix Tree Application 3 – Longest Repeated Substring
- Suffix Tree Application 4 – Build Linear Time Suffix Array
- Generalized Suffix Tree 1
- Suffix Tree Application 5 – Longest Common Substring
- Suffix Tree Application 6 – Longest Palindromic Substring
This article is contributed by Anurag Singh. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above