Open In App

Ukkonen’s Suffix Tree Construction – Part 5

This article is continuation of following four articles:
Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2
Ukkonen’s Suffix Tree Construction – Part 3
Ukkonen’s Suffix Tree Construction – Part 4

Please go through Part 1, Part 2, Part 3 and Part 4, before looking at current article, where we have seen few basics on suffix tree, high level ukkonen’s algorithm, suffix link and three implementation tricks and some details on activePoint along with an example string “abcabxabcd” where we went through six phases of building suffix tree.
Here, we will go through rest of the phases (7 to 11) and build the tree completely.

*********************Phase 7*********************************
In phase 7, we read 7th character (a) from string S

At the end of phase 7, remainingSuffixCount is 1 (One suffix ‘a’, the last one, is not added explicitly in tree, but it is there in tree implicitly).
Above Figure 33 is the resulting tree after phase 7.

*********************Phase 8*********************************
In phase 8, we read 8th character (b) from string S

At the end of phase 8, remainingSuffixCount is 2 (Two suffixes, ‘ab’ and ‘b’, the last two, are not added explicitly in tree explicitly, but they are in tree implicitly).

*********************Phase 9*********************************
In phase 9, we read 9th character (c) from string S

At the end of phase 9, remainingSuffixCount is 3 (Three suffixes, ‘abc’, ‘bc’ and ‘c’, the last three, are not added explicitly in tree explicitly, but they are in tree implicitly).

*********************Phase 10*********************************
In phase 10, we read 10th character (d) from string S

We see following facts in Phase 10:

*********************Phase 11*********************************
In phase 11, we read 11th character ($) from string S

Now we have added all suffixes of string ‘abcabxabcd$’ in suffix tree. There are 11 leaf ends in this tree and labels on the path from root to leaf end represents one suffix. Now the only one thing left is to assign a number (suffix index) to each leaf end and that number would be the suffix starting position in the string S. This can be done by a DFS traversal on tree. While DFS traversal, keep track of label length and when a leaf end is found, set the suffix index as “stringSize – labelSize + 1”. Indexed suffix tree will look like below:


In above Figure, suffix indices are shown as character position starting with 1 (It’s not zero indexed). In code implementation, suffix index will be set as zero indexed, i.e. where we see suffix index j (1 to m for string of length m) in above figure, in code implementation, it will be j-1 (0 to m-1)
And we are done !!!!

Data Structure to represent suffix tree

How to represent the suffix tree?? There are nodes, edges, labels and suffix links and indices.
Below are some of the operations/query we will be doing while building suffix tree and later on while using the suffix tree in different applications/usages:

We may think of different data structures which can fulfil these requirements.
In the next Part 6, we will discuss the data structure we will use in our code implementation and the code as well.


Article Tags :