Open In App

Suffix Arrays for Competitive Programming

Last Updated : 12 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

A suffix array is a sorted array of all suffixes of a given string. More formally if you are given a string ‘S’ then the suffix array for this string contains the indices 0 to n, such that the suffixes starting from these indices are sorted lexicographically.

suffix-array-competetive-programme

Example:

Input: banana

0 banana 5 a
1 anana Sort the Suffixes 3 ana
2 nana —————-> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5 a 2 nana

So the suffix array for “banana” is {5, 3, 1, 0, 4, 2}

Construction of Suffix Arrays:

  1. Naive way to construct suffix array
  2. Using Radix Sort to construct suffix array in O(n * Log(n))

Use Cases of Suffix Array:

1. Searching a Substring in a string:

Problem: Given a string ‘S‘ and a string ‘T‘ determine whether the string T is a substring of S, if so return the index at which T is a substring of S.

Example:

Input: S = “bannana” , T = “nan”
Output: 3

Naive Solution: In O(|S| * |T|) we can iterate on each index of ‘T’ and then compare whether the substring starting at that index matches ‘S’ or not.

Solution using Suffix Array: We can notice that any substring is a prefix of some suffix. In the suffix array for string ‘S‘ we cut off the first |T| characters of each suffix and get all the substring of length atmost |T| in a sorted order. In order to find S we can simply apply binary search and compare the mid string to string S.

  • If mid string of suffix array is lexicographically smaller than ‘T’ then binary search on right half.
  • If mid string of suffix array is lexicographically greater than ‘T’ then binary search on left half.
  • If both the string match return that index as our result.

Time Complexity: O(|S| * log(|S|) + |T| * log(|S|) ), where O(|S| * log(|S|)) is to construct suffix array for string S and O(|T| * log(|S|)) is to search and compare string T.

2. Finding Longest Common Prefix (LCP):

Problem: Given a string ‘S‘ and Q queries of the form {i, j}. Find the LCP(i, j) i.e. length of the Longest Common Prefix(LCP) for the suffixes starting at index i and j.

Example:

Input: S = “banana” , Query = {{0, 5}, {4, 2}, {1, 3}}
Output: 0 2 3
Explanation: Query[0] = {0, 5} = LCP (banana, a) = ‘ ‘ = 0
Query[1] = {4, 2} = LCP (na, nana) = ‘na’ = 2
Query[2] = {1, 3} = LCP (anana, ana) = ‘ana’ = 3

Naive Solution: For each query we can we can compare both the suffixes starting from i and j in O(|S|) thus giving us a total time complexity of O(Q*|S| )

Solution using Suffix Array: Let our suffix array be Suffix[], in order to solve the problem let us construct an array lcp[] such that lcp[i] = LCP(Suffix[i], Suffix[i+1]). In simple language the lcp[] array stores the Longest common prefix of adjacent indices in suffix array as shown in the below image for string S = “banana”.

Construction-Of-LCP-array

Now in order to calculate LCP(i, j) just find the position of i and j in suffix array and calculate the minimum value in range lcp[Suffix[i]] to lcp[Suffix[j]-1].

suffix-array

Proof: Let LCP(i, j) = k , since the Suffixes are sorted in Lexicographical order, therefore each suffix from Suffix[i] to Suffix[j] will have atleast k common characters at string, So all lcp from i to j is not less than k and therefore the minimum on this segement is not less than k. On the other hand, it cannot be greater than k, since this means that each pair of suffixes has more than k common characters, which means that i and j must have more than k common characters.

Note: Interestingly we can construct a sparse table in order to answer each query in O(1).
How to construct the lcp[] array in O(N)

Time Complexity: O((|S| * log|S|) + Q)

3. Number of Different Substrings:

Problem: Given a string ‘S‘, the task is to find the total number of unique substrings of S.

Example:

Input: S=’abab’
Output: 7
Explanation: Unique substrings of “abab” = {“abab”,”aba”,”ab”,”a”,”bab”,”ba”,”b”}

Solution using Suffix array: As we know that any substring is a prefix of some suffix. In order to calculate the total number of distinct substrings we can iterate the suffix array (where suffixes are sorted) ,the total number of prefixes is equal to the length of the suffix. In order to find out which of them have already occurred in the previous suffixes, we just need to subtract the LCP of this suffix with the previous one.

The below image shows how to calculate number of distinct substrings for the string “BANANA” using suffix and lcp array.

calculating

Practice problems on Suffix Array:



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads