Given a string, build it’s Suffix Array
We have already discussed following two ways of building suffix array:
Please go through these to have the basic understanding.
Here we will see how to build suffix array in linear time using suffix tree.
As a prerequisite, we must know how to build a suffix tree in one or the other way.
Here we will build suffix tree using Ukkonen’s Algorithm, discussed already as below:
Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2
Ukkonen’s Suffix Tree Construction – Part 3
Ukkonen’s Suffix Tree Construction – Part 4
Ukkonen’s Suffix Tree Construction – Part 5
Ukkonen’s Suffix Tree Construction – Part 6
Lets consider string abcabxabcd.
It’s suffix array would be:
0 6 3 1 7 4 2 8 9 5
Lets look at following figure:
If we do a DFS traversal, visiting edges in lexicographic order (we have been doing the same traversal in other Suffix Tree Application articles as well) and print suffix indices on leaves, we will get following:
10 0 6 3 1 7 4 2 8 9 5
“$” is lexicographically lesser than [a-zA-Z].
The suffix index 10 corresponds to edge with “$” label.
Except this 1st suffix index, the sequence of all other numbers gives the suffix array of the string.
So if we have a suffix tree of the string, then to get it’s suffix array, we just need to do a lexicographic order DFS traversal and store all the suffix indices in resultant suffix array, except the very 1st suffix index.
Suffix Array for String banana is: 5 3 1 0 4 2 Suffix Array for String GEEKSFORGEEKS is: 9 1 10 2 5 8 0 11 3 6 7 12 4 Suffix Array for String AAAAAAAAAA is: 9 8 7 6 5 4 3 2 1 0 Suffix Array for String ABCDEFG is: 0 1 2 3 4 5 6 Suffix Array for String ABABABA is: 6 4 2 0 5 3 1 Suffix Array for String abcabxabcd is: 0 6 3 1 7 4 2 8 9 5 Suffix Array for String CCAAACCCGATTA is: 12 2 3 4 9 1 0 5 6 7 8 11 10
Ukkonen’s Suffix Tree Construction takes O(N) time and space to build suffix tree for a string of length N and after that, traversal of tree take O(N) to build suffix array.
So overall, it’s linear in time and space.
Can you see why traversal is O(N) ?? Because a suffix tree of string of length N will have at most N-1 internal nodes and N leaves. Traversal of these nodes can be done in O(N).
We have published following more articles on suffix tree applications:
- Suffix Tree Application 1 – Substring Check
- Suffix Tree Application 2 – Searching All Patterns
- Suffix Tree Application 3 – Longest Repeated Substring
- Generalized Suffix Tree 1
- Suffix Tree Application 5 – Longest Common Substring
- Suffix Tree Application 6 – Longest Palindromic Substring
This article is contributed by Anurag Singh. Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above
Don’t stop now and take your learning to the next level. Learn all the important concepts of Data Structures and Algorithms with the help of the most trusted course: DSA Self Paced. Become industry ready at a student-friendly price.
- Suffix Tree Application 1 - Substring Check
- Suffix Tree Application 2 - Searching All Patterns
- Suffix Tree Application 5 - Longest Common Substring
- Suffix Tree Application 3 - Longest Repeated Substring
- Suffix Tree Application 6 - Longest Palindromic Substring
- Overview of Data Structures | Set 3 (Graph, Trie, Segment Tree and Suffix Tree)
- Generalized Suffix Tree 1
- Pattern Searching using Suffix Tree
- Ukkonen's Suffix Tree Construction - Part 4
- Ukkonen's Suffix Tree Construction - Part 5
- Ukkonen's Suffix Tree Construction - Part 6
- Ukkonen's Suffix Tree Construction - Part 3
- Ukkonen's Suffix Tree Construction - Part 1
- Ukkonen's Suffix Tree Construction - Part 2
- Suffix Array | Set 1 (Introduction)
- Suffix Array | Set 2 (nLogn Algorithm)
- Counting k-mers via Suffix Array
- Count of distinct substrings of a string using Suffix Array
- kasai’s Algorithm for Construction of LCP array from Suffix Array
- Find strings that end with a given suffix
Improved By : nidhi_biet