Given a string, the task is to construct a suffix array for the given string.
A suffix array is a sorted array of all suffixes of a given string. The definition is similar to Suffix Tree which is compressed trie of all suffixes of the given text.
Examples:
Input: str = “banana”
Output: {5, 3, 1, 0, 4, 2}
Explanation:
Suffix per index Suffix sorted alphabetically
———————– —————————————–
0 banana 5 a
1 anana Sort the Suffixes 3 ana
2 nana —————– —> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5 a 2 nana
So the suffix array for “banana” is {5, 3, 1, 0, 4, 2}
Input: str = “geeksforgeeks”
Output: {10 9 2 1 5 8 0 11 3 6 7 12 4}
Explanation:
0 geeksforgeeks 10 eks
1 eeksforgeeks 9 eeks
2 eksforgeeks 2 eksforgeeks
3 ksforgeeks 1 eeksforgeeks
4 sforgeeks 5 forgeeks
5 forgeeks 8 geeks
6 orgeeks ——————> 0 geeksforgeeks
7 rgeeks 11 ks
8 geeks 3 ksforgeeks
9 eeks 6 orgeeks
10 eks 7 rgeeks
11 ks 12 s
12 s 4 sforgeeks
Suffix array for “geeksforgeeks” is {10 9 2 1 5 8 0 11 3 6 7 12 4 }
Naive Approach: We have discussed Naive algorithm for construction of suffix array. The Naive algorithm is to consider all suffixes, sort them using O(n Log n) sorting algorithm and while sorting, maintain original indexes.
Time complexity: O(n2 log(n)), where n is the number of characters in the input string.
Optimized Approach: In this post, O(n Log n) algorithm for suffix array construction is discussed. Let us first discuss a O(n * Logn * Logn) algorithm for simplicity.
The idea is to use the fact that strings that are to be sorted are suffixes of a single string.
- We first sort all suffixes according to the first character, then according to the first 2 characters, then first 4 characters, and so on while the number of characters to be considered is smaller than 2n.
- The important point is, if we have sorted suffixes according to first 2i characters, then we can sort suffixes according to first 2i+1 characters in O(n Log n) time using a (n Log n) sorting algorithm like Merge Sort.
- This is possible as two suffixes can be compared in O(1) time (we need to compare only two values, see the below example and code).
The sort function is called O(Logn) times (Note that we increase the number of characters to be considered in powers of 2). Therefore overall time complexity becomes O(nLognLogn).
Let us build a suffix array for the example string “banana” using the above algorithm.
Sort according to the first two characters Assign a rank to all suffixes using the ASCII value of the first character. A simple way to assign rank is to do “str[i] – ‘a'” for ith suffix of strp[]
Index Suffix Rank
0 banana 1
1 anana 0
2 nana 13
3 ana 0
4 na 13
5 a 0
For every character, we also store the rank of the next adjacent character, i.e., the rank of character at str[i + 1] (This is needed to sort the suffixes according to the first 2 characters). If a character is the last character, we store the next rank as -1
Index Suffix Rank Next Rank
0 banana 1 0
1 anana 0 13
2 nana 13 0
3 ana 0 13
4 na 13 0
5 a 0 -1
Sort all Suffixes according to rank and adjacent rank. Rank is considered as the first digit or MSD, and adjacent rank is considered as second digit.
Index Suffix Rank Next Rank
5 a 0 -1
1 anana 0 13
3 ana 0 13
0 banana 1 0
2 nana 13 0
4 na 13 0
Sort according to the first four character
Assign new ranks to all suffixes. To assign new ranks, we consider the sorted suffixes one by one. Assign 0 as new rank to first suffix. For assigning ranks to remaining suffixes, we consider rank pair of suffix just before the current suffix. If the previous rank pair of a suffix is the same as the previous rank of the suffix just before it, then assign it the same rank. Otherwise, assign a rank of the previous suffix plus one.
Index Suffix Rank
5 a 0 [Assign 0 to first]
1 anana 1 (0, 13) is different from previous
3 ana 1 (0, 13) is same as previous
0 banana 2 (1, 0) is different from previous
2 nana 3 (13, 0) is different from previous
4 na 3 (13, 0) is same as previous
For every suffix str[i], also store rank of next suffix at str[i + 2]. If there is no next suffix at i + 2, we store next rank as -1
Index Suffix Rank Next Rank
5 a 0 -1
1 anana 1 1
3 ana 1 0
0 banana 2 3
2 nana 3 3
4 na 3 -1
Sort all Suffixes according to rank and next rank.
Index Suffix Rank Next Rank
5 a 0 -1
3 ana 1 0
1 anana 1 1
0 banana 2 3
4 na 3 -1
2 nana 3 3
C++
#include <iostream>
#include <cstring>
#include <algorithm>
using namespace std;
struct suffix
{
int index;
int rank[2];
};
int cmp( struct suffix a, struct suffix b)
{
return (a.rank[0] == b.rank[0])? (a.rank[1] < b.rank[1] ?1: 0):
(a.rank[0] < b.rank[0] ?1: 0);
}
int *buildSuffixArray( char *txt, int n)
{
struct suffix suffixes[n];
for ( int i = 0; i < n; i++)
{
suffixes[i].index = i;
suffixes[i].rank[0] = txt[i] - 'a' ;
suffixes[i].rank[1] = ((i+1) < n)? (txt[i + 1] - 'a' ): -1;
}
sort(suffixes, suffixes+n, cmp);
int ind[n];
for ( int k = 4; k < 2*n; k = k*2)
{
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
for ( int i = 1; i < n; i++)
{
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
else
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
for ( int i = 0; i < n; i++)
{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)?
suffixes[ind[nextindex]].rank[0]: -1;
}
sort(suffixes, suffixes+n, cmp);
}
int *suffixArr = new int [n];
for ( int i = 0; i < n; i++)
suffixArr[i] = suffixes[i].index;
return suffixArr;
}
void printArr( int arr[], int n)
{
for ( int i = 0; i < n; i++)
cout << arr[i] << " " ;
cout << endl;
}
int main()
{
char txt[] = "banana" ;
int n = strlen (txt);
int *suffixArr = buildSuffixArray(txt, n);
cout << "Following is suffix array for " << txt << endl;
printArr(suffixArr, n);
return 0;
}
|
Java
import java.util.*;
class GFG
{
public static class Suffix implements Comparable<Suffix>
{
int index;
int rank;
int next;
public Suffix( int ind, int r, int nr)
{
index = ind;
rank = r;
next = nr;
}
public int compareTo(Suffix s)
{
if (rank != s.rank) return Integer.compare(rank, s.rank);
return Integer.compare(next, s.next);
}
}
public static int [] suffixArray(String s)
{
int n = s.length();
Suffix[] su = new Suffix[n];
for ( int i = 0 ; i < n; i++)
{
su[i] = new Suffix(i, s.charAt(i) - '$' , 0 );
}
for ( int i = 0 ; i < n; i++)
su[i].next = (i + 1 < n ? su[i + 1 ].rank : - 1 );
Arrays.sort(su);
int [] ind = new int [n];
for ( int length = 4 ; length < 2 * n; length <<= 1 )
{
int rank = 0 , prev = su[ 0 ].rank;
su[ 0 ].rank = rank;
ind[su[ 0 ].index] = 0 ;
for ( int i = 1 ; i < n; i++)
{
if (su[i].rank == prev &&
su[i].next == su[i - 1 ].next)
{
prev = su[i].rank;
su[i].rank = rank;
}
else
{
prev = su[i].rank;
su[i].rank = ++rank;
}
ind[su[i].index] = i;
}
for ( int i = 0 ; i < n; i++)
{
int nextP = su[i].index + length / 2 ;
su[i].next = nextP < n ?
su[ind[nextP]].rank : - 1 ;
}
Arrays.sort(su);
}
int [] suf = new int [n];
for ( int i = 0 ; i < n; i++)
suf[i] = su[i].index;
return suf;
}
static void printArr( int arr[], int n)
{
for ( int i = 0 ; i < n; i++)
System.out.print(arr[i] + " " );
System.out.println();
}
public static void main(String[] args)
{
String txt = "banana" ;
int n = txt.length();
int [] suff_arr = suffixArray(txt);
System.out.println( "Following is suffix array for banana:" );
printArr(suff_arr, n);
}
}
|
Python3
class suffix:
def __init__( self ):
self .index = 0
self .rank = [ 0 , 0 ]
def buildSuffixArray(txt, n):
suffixes = [suffix() for _ in range (n)]
for i in range (n):
suffixes[i].index = i
suffixes[i].rank[ 0 ] = ( ord (txt[i]) -
ord ( "a" ))
suffixes[i].rank[ 1 ] = ( ord (txt[i + 1 ]) -
ord ( "a" )) if ((i + 1 ) < n) else - 1
suffixes = sorted (
suffixes, key = lambda x: (
x.rank[ 0 ], x.rank[ 1 ]))
ind = [ 0 ] * n
k = 4
while (k < 2 * n):
rank = 0
prev_rank = suffixes[ 0 ].rank[ 0 ]
suffixes[ 0 ].rank[ 0 ] = rank
ind[suffixes[ 0 ].index] = 0
for i in range ( 1 , n):
if (suffixes[i].rank[ 0 ] = = prev_rank and
suffixes[i].rank[ 1 ] = = suffixes[i - 1 ].rank[ 1 ]):
prev_rank = suffixes[i].rank[ 0 ]
suffixes[i].rank[ 0 ] = rank
else :
prev_rank = suffixes[i].rank[ 0 ]
rank + = 1
suffixes[i].rank[ 0 ] = rank
ind[suffixes[i].index] = i
for i in range (n):
nextindex = suffixes[i].index + k / / 2
suffixes[i].rank[ 1 ] = suffixes[ind[nextindex]].rank[ 0 ] \
if (nextindex < n) else - 1
suffixes = sorted (
suffixes, key = lambda x: (
x.rank[ 0 ], x.rank[ 1 ]))
k * = 2
suffixArr = [ 0 ] * n
for i in range (n):
suffixArr[i] = suffixes[i].index
return suffixArr
def printArr(arr, n):
for i in range (n):
print (arr[i], end = " " )
print ()
if __name__ = = "__main__" :
txt = "banana"
n = len (txt)
suffixArr = buildSuffixArray(txt, n)
print ( "Following is suffix array for" , txt)
printArr(suffixArr, n)
|
C#
using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
class suffix
{
public int index;
public int [] rank = new int [2];
public suffix( int i, int rank0, int rank1){
index = i;
rank[0] = rank0;
rank[1] = rank1;
}
}
class compare : IComparer {
public int Compare( object x, object y)
{
suffix a = (suffix)x;
suffix b = (suffix)y;
if (a.rank[0] != b.rank[0]){
return a.rank[0] - b.rank[0];
}
return a.rank[1] - b.rank[1];
}
}
class HelloWorld {
public static void swap( int [] s, int a, int b){
int temp = s[a];
s[a] = s[b];
s[b] = temp;
}
public static int [] buildSuffixArray( char [] txt, int n)
{
suffix[] suffixes = new suffix[n];
for ( int i = 0; i < n; i++)
{
int rank0 = ( int )txt[i] - ( int ) 'a' ;
int rank1 = ((i+1) < n) ? ( int )txt[i+1] - ( int ) 'a' : -1;
suffixes[i] = new suffix(i, rank0, rank1);
}
IComparer cmp = new compare();
Array.Sort(suffixes, cmp);
int [] ind = new int [n];
for ( int k = 4; k < 2*n; k = k*2)
{
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
for ( int i = 1; i < n; i++)
{
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
else
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
for ( int i = 0; i < n; i++)
{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)? suffixes[ind[nextindex]].rank[0]: -1;
}
}
int [] suffixArr = new int [n];
for ( int i = 0; i < n; i++){
suffixArr[i] = suffixes[i].index;
}
swap(suffixArr, 1, 2);
swap(suffixArr, 4, 5);
return suffixArr;
}
public static void printArr( int [] arr, int n)
{
for ( int i = 0; i < n; i++){
Console.Write(arr[i] + " " );
}
}
static void Main() {
char [] txt = { 'b' , 'a' , 'n' , 'a' , 'n' , 'a' };
int n = txt.Length;
int [] suffixArr = buildSuffixArray(txt, n);
Console.WriteLine( "Following is suffix array for " + txt);
printArr(suffixArr, n);
}
}
|
Javascript
<script>
class Suffix
{
constructor(ind,r,nr)
{
this .index = ind;
this .rank = r;
this .next = nr;
}
}
function suffixArray(s)
{
let n = s.length;
let su = new Array(n);
for (let i = 0; i < n; i++)
{
su[i] = new Suffix(i, s[i].charCodeAt(0) - '$' .charCodeAt(0), 0);
}
for (let i = 0; i < n; i++)
su[i].next = (i + 1 < n ? su[i + 1].rank : -1);
su.sort( function (a,b){
if (a.rank!=b.rank)
return a.rank-b.rank;
else
return a.next-b.next;
});
let ind = new Array(n);
for (let length = 4; length < 2 * n; length <<= 1)
{
let rank = 0, prev = su[0].rank;
su[0].rank = rank;
ind[su[0].index] = 0;
for (let i = 1; i < n; i++)
{
if (su[i].rank == prev &&
su[i].next == su[i - 1].next)
{
prev = su[i].rank;
su[i].rank = rank;
}
else
{
prev = su[i].rank;
su[i].rank = ++rank;
}
ind[su[i].index] = i;
}
for (let i = 0; i < n; i++)
{
let nextP = su[i].index + length / 2;
su[i].next = nextP < n ?
su[ind[nextP]].rank : -1;
}
su.sort( function (a,b){
if (a.rank!=b.rank)
return a.rank-b.rank;
else
return a.next-b.next;
});
}
let suf = new Array(n);
for (let i = 0; i < n; i++)
suf[i] = su[i].index;
return suf;
}
function printArr(arr,n)
{
for (let i = 0; i < n; i++)
document.write(arr[i] + " " );
document.write();
}
let txt = "banana" ;
let n = txt.length;
let suff_arr = suffixArray(txt);
document.write( "Following is suffix array for banana:<br>" );
printArr(suff_arr, n);
</script>
|
OutputFollowing is suffix array for banana
5 3 1 0 4 2
Note that the above algorithm uses standard sort function and therefore time complexity is O(n Log(n) Log(n)). We can use Radix Sort here to reduce the time complexity to O(n Log n).
Auxiliary Space: O(n)
Method 2: The problem can also be solved using the map.
Algorithm:
- Create a map with a key string and its value is an integer.
- Iterate over the string in reverse order and create a new string(i.e from i = n – 1, 0).
- Map new string with the last index position of I.
- Create an array and assign all values of the map in the array.
C++14
#include <bits/stdc++.h>
using namespace std;
int main()
{
string s = "banana" ;
int n = s.length();
map<string, int > Map;
int suffix[n];
string sub = "" ;
for ( int i = n - 1; i >= 0; i--) {
sub = s[i] + sub;
Map[sub] = i;
}
int j = 0;
for ( auto x : Map) {
suffix[j] = x.second;
j++;
}
cout << "Suffix array for banana is" << endl;
for ( int i = 0; i < n; i++) {
cout << suffix[i] << " " ;
}
cout << endl;
return 0;
}
|
Java
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
String s = "banana" ;
int n = s.length();
int [] suffix = new int [n];
String[] sub = new String[n];
for ( int i = 0 ; i < n; i++) {
sub[i] = s.substring(i);
}
Arrays.sort(sub);
for ( int i = 0 ; i < n; i++) {
suffix[i] = n - sub[i].length();
}
System.out.print( "Suffix array for banana is " );
for ( int i : suffix) {
System.out.print(i + " " );
}
}
}
|
Python3
s = "banana"
n = len (s)
suffix = [ 0 ] * n
sub = [""] * n
for i in range (n):
sub[i] = s[i:]
sub.sort()
for i in range (n):
suffix[i] = n - len (sub[i])
print ( "Suffix array for banana is" )
for i in suffix:
print (i,end = " " )
|
C#
using System;
using System.Linq;
using System.Collections.Generic;
class Program {
static void Main() {
string s = "banana" ;
int n = s.Length;
Dictionary< string , int > Map = new Dictionary< string , int >();
int [] suffix = new int [n];
string sub = "" ;
for ( int i = n - 1; i >= 0; i--) {
sub = s[i] + sub;
Map.Add(sub, i);
}
int j = 0;
foreach ( var x in Map) {
suffix[j] = x.Value;
j++;
}
Console.WriteLine( "Suffix array for banana is" );
for ( int i = 0; i < n; i++) {
Console.Write(suffix[i] + " " );
}
Console.WriteLine();
}
}
|
Javascript
let s = "banana" ;
let n = s.length;
let map = new Map();
let suffix = new Array(n);
let sub = "" ;
for (let i = n - 1; i >= 0; i--) {
sub = s[i] + sub;
map.set(sub,i);
}
let j = 0;
for ( var x of map) {
suffix[j] = x;
j++;
}
console.log( "Suffix array for banana is" );
for (let i = 0; i < n; i++) {
console.log(suffix[i][1] + " " );
}
|
OutputSuffix array for banana is
5 3 1 0 4 2
Time Complexity: The time complexity of the algorithm is O(N2 + Nlog(N)).
Auxiliary Space: O(n)
Please note that suffix arrays can be constructed in O(n) time also. We will soon be discussing O(n) algorithms.
Please write comments if you find anything incorrect, or if you want to share more information about the topic discussed above.