Suffix Automation
Last Updated :
18 Jan, 2024
In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. In this article, we will delve into the concept of suffix automation, exploring its components, construction process, implementation, and real-world applications.
Suffix Tree and Suffix Links:
To appreciate suffix automation, it’s crucial to first understand the concept of a suffix tree and its related concept, suffix links.
Suffix Tree:
A suffix tree is a tree-like data structure that represents all the substrings of a given string S. Each leaf node in the tree represents a unique suffix of the string, and the path from the root to a leaf spells out a substring of S. Suffix trees are used for various string processing tasks, such as pattern matching, substring searching, and substring counting.
Suffix Links:
Suffix links are a key concept when constructing a suffix automation. They are pointers that link internal nodes in a suffix tree to other internal nodes. Specifically, a suffix link connects a node corresponding to a non-empty substring S[i, j] to a node representing a shorter substring S[i+1, j]. Suffix links play a crucial role in efficiently constructing the suffix automation.
Constructing the Suffix Automation:
The suffix automation is a deterministic finite automation that efficiently represents all substrings of a given string. It is constructed from a suffix tree with the help of suffix links. The key steps involved in building the suffix automation are as follows:
- Suffix Tree Construction: Start by constructing a suffix tree for the given string S. This can be done efficiently using algorithms like Ukkonen’s algorithm or McCreight’s algorithm.
- Suffix Links: Determine suffix links in the suffix tree. Suffix links can be computed during or after the suffix tree construction. To compute suffix links, you can perform a depth-first traversal of the suffix tree. When traversing a node, identify its longest suffix that is a separate substring and connect it to the corresponding node in the tree.
- Compact Suffix Automaton: The compact suffix automaton can be extracted from the suffix tree and its suffix links. The compact suffix automation is a minimal deterministic finite automation that represents all the substrings of the original string S.
Suffix Automation Implemenation:
Implementing a suffix automation requires expertise in data structures and algorithms. The following are some steps to consider when implementing a suffix automation:
- Data Structure: Choose an appropriate data structure to represent the automation efficiently. Typically, a graph-based representation using arrays and pointers is used.
- Transition Functions: Define the transition functions of the automation. Given a state and a character, these functions should determine the next state.
- Suffix Links: Implement suffix links in the automaton to efficiently traverse it. This step is crucial for applications requiring substring matching.
- Construction: Construct the automation based on the previously constructed suffix tree and suffix links. Ensure that it represents all substrings of the input string.
Here’s a simplified example to get you started. This code assumes that you already have a suffix tree and suffix links, as constructing a suffix automation directly from a string would be more involved.
C++
#include <iostream>
#include <unordered_map>
#include <vector>
using namespace std;
struct SuffixAutomatonNode {
unordered_map< char , int > next;
int length;
int link;
};
vector<SuffixAutomatonNode> suffixAutomaton;
int last;
void initialize() {
SuffixAutomatonNode initialNode;
initialNode.length = 0;
initialNode.link = -1;
suffixAutomaton.push_back(initialNode);
last = 0;
}
void extendAutomaton( char c) {
SuffixAutomatonNode newNode;
newNode.length = suffixAutomaton[last].length + 1;
int current = last;
while (current != -1 && suffixAutomaton[current].next.find(c) == suffixAutomaton[current].next.end()) {
suffixAutomaton[current].next = suffixAutomaton.size();
current = suffixAutomaton[current].link;
}
if (current == -1) {
newNode.link = 0;
} else {
int next = suffixAutomaton[current].next;
if (suffixAutomaton[current].length + 1 == suffixAutomaton[next].length) {
newNode.link = next;
} else {
SuffixAutomatonNode cloneNode = suffixAutomaton[next];
cloneNode.length = suffixAutomaton[current].length + 1;
suffixAutomaton.push_back(cloneNode);
while (current != -1 && suffixAutomaton[current].next == next) {
suffixAutomaton[current].next = suffixAutomaton.size() - 1;
current = suffixAutomaton[current].link;
}
newNode.link = suffixAutomaton.size() - 1;
suffixAutomaton[next].link = newNode.link;
}
}
suffixAutomaton.push_back(newNode);
last = suffixAutomaton.size() - 1;
}
void traverseAutomaton() {
cout << "Traversing Suffix Automaton:\n" ;
for ( int i = 0; i < suffixAutomaton.size(); ++i) {
cout << "State " << i << ", Length: " << suffixAutomaton[i].length << ", Suffix Link: " << suffixAutomaton[i].link << "\n" ;
for ( const auto & transition : suffixAutomaton[i].next) {
cout << " Transition on '" << transition.first << "' to State " << transition.second << "\n" ;
}
}
}
int main() {
string input = "abab" ;
initialize();
for ( char c : input) {
extendAutomaton(c);
}
traverseAutomaton();
return 0;
}
|
Java
import java.util.HashMap;
import java.util.Map;
import java.util.Vector;
class SuffixAutomatonNode {
Map<Character, Integer> next;
int length;
int link;
SuffixAutomatonNode() {
next = new HashMap<>();
length = 0 ;
link = - 1 ;
}
}
public class SuffixAutomaton {
static Vector<SuffixAutomatonNode> suffixAutomaton;
static int last;
static void initialize() {
SuffixAutomatonNode initialNode = new SuffixAutomatonNode();
suffixAutomaton = new Vector<>();
suffixAutomaton.add(initialNode);
last = 0 ;
}
static void extendAutomaton( char c) {
SuffixAutomatonNode newNode = new SuffixAutomatonNode();
newNode.length = suffixAutomaton.get(last).length + 1 ;
int current = last;
while (current != - 1 && !suffixAutomaton.get(current).next.containsKey(c)) {
suffixAutomaton.get(current).next.put(c, suffixAutomaton.size());
current = suffixAutomaton.get(current).link;
}
if (current == - 1 ) {
newNode.link = 0 ;
} else {
int next = suffixAutomaton.get(current).next.get(c);
if (suffixAutomaton.get(current).length + 1 == suffixAutomaton.get(next).length) {
newNode.link = next;
} else {
SuffixAutomatonNode cloneNode = new SuffixAutomatonNode();
cloneNode = suffixAutomaton.get(next);
cloneNode.length = suffixAutomaton.get(current).length + 1 ;
suffixAutomaton.add(cloneNode);
while (current != - 1 && suffixAutomaton.get(current).next.get(c) == next) {
suffixAutomaton.get(current).next.put(c, suffixAutomaton.size() - 1 );
current = suffixAutomaton.get(current).link;
}
newNode.link = suffixAutomaton.size() - 1 ;
suffixAutomaton.get(next).link = newNode.link;
}
}
suffixAutomaton.add(newNode);
last = suffixAutomaton.size() - 1 ;
}
static void traverseAutomaton() {
System.out.println( "Traversing Suffix Automaton:" );
for ( int i = 0 ; i < suffixAutomaton.size(); ++i) {
System.out.println( "State " + i + ", Length: " +
suffixAutomaton.get(i).length +
", Suffix Link: " +
suffixAutomaton.get(i).link);
for (Map.Entry<Character, Integer> transition :
suffixAutomaton.get(i).next.entrySet()) {
System.out.println( " Transition on '" +
transition.getKey() + "' to State " +
transition.getValue());
}
}
}
public static void main(String[] args) {
String input = "abab" ;
initialize();
for ( char c : input.toCharArray()) {
extendAutomaton(c);
}
traverseAutomaton();
}
}
|
Python3
class SuffixAutomatonNode:
def __init__( self ):
self . next = {}
self .length = 0
self .link = - 1
class SuffixAutomaton:
def __init__( self ):
self .suffix_automaton = []
self .last = 0
def initialize( self ):
initial_node = SuffixAutomatonNode()
self .suffix_automaton = [initial_node]
self .last = 0
def extend_automaton( self , c):
new_node = SuffixAutomatonNode()
new_node.length = self .suffix_automaton[ self .last].length + 1
current = self .last
while current ! = - 1 and c not in self .suffix_automaton[current]. next :
self .suffix_automaton[current]. next = len ( self .suffix_automaton)
current = self .suffix_automaton[current].link
if current = = - 1 :
new_node.link = 0
else :
next_state = self .suffix_automaton[current]. next
if self .suffix_automaton[current].length + 1 = = self .suffix_automaton[next_state].length:
new_node.link = next_state
else :
clone_node = SuffixAutomatonNode()
clone_node = self .suffix_automaton[next_state]
clone_node.length = self .suffix_automaton[current].length + 1
self .suffix_automaton.append(clone_node)
while current ! = - 1 and self .suffix_automaton[current]. next = = next_state:
self .suffix_automaton[current]. next = len ( self .suffix_automaton) - 1
current = self .suffix_automaton[current].link
new_node.link = len ( self .suffix_automaton) - 1
self .suffix_automaton[next_state].link = new_node.link
self .suffix_automaton.append(new_node)
self .last = len ( self .suffix_automaton) - 1
def traverse_automaton( self ):
print ( "Traversing Suffix Automaton:" )
for i, state in enumerate ( self .suffix_automaton):
print (f "State {i}, Length: {state.length}, Suffix Link: {state.link}" )
for char, next_state in state. next .items():
print (f " Transition on '{char}' to State {next_state}" )
def main():
input_str = "abab"
suffix_automaton_instance = SuffixAutomaton()
suffix_automaton_instance.initialize()
for char in input_str:
suffix_automaton_instance.extend_automaton(char)
suffix_automaton_instance.traverse_automaton()
if __name__ = = "__main__" :
main()
|
C#
using System;
using System.Collections.Generic;
class SuffixAutomatonNode
{
public Dictionary< char , int > Next;
public int Length;
public int Link;
public SuffixAutomatonNode()
{
Next = new Dictionary< char , int >();
Length = 0;
Link = -1;
}
}
class GFG
{
static List<SuffixAutomatonNode> SuffixAutomaton;
static int Last;
static void Initialize()
{
SuffixAutomatonNode initialNode = new SuffixAutomatonNode();
SuffixAutomaton = new List<SuffixAutomatonNode>();
SuffixAutomaton.Add(initialNode);
Last = 0;
}
static void ExtendAutomaton( char c)
{
SuffixAutomatonNode newNode = new SuffixAutomatonNode();
newNode.Length = SuffixAutomaton[Last].Length + 1;
int current = Last;
while (current != -1 && !SuffixAutomaton[current].Next.ContainsKey(c))
{
SuffixAutomaton[current].Next = SuffixAutomaton.Count;
current = SuffixAutomaton[current].Link;
}
if (current == -1)
{
newNode.Link = 0;
}
else
{
int next = SuffixAutomaton[current].Next;
if (SuffixAutomaton[current].Length + 1 == SuffixAutomaton[next].Length)
{
newNode.Link = next;
}
else
{
SuffixAutomatonNode cloneNode = new SuffixAutomatonNode();
cloneNode = SuffixAutomaton[next];
cloneNode.Length = SuffixAutomaton[current].Length + 1;
SuffixAutomaton.Add(cloneNode);
while (current != -1 && SuffixAutomaton[current].Next == next)
{
SuffixAutomaton[current].Next = SuffixAutomaton.Count - 1;
current = SuffixAutomaton[current].Link;
}
newNode.Link = SuffixAutomaton.Count - 1;
SuffixAutomaton[next].Link = newNode.Link;
}
}
SuffixAutomaton.Add(newNode);
Last = SuffixAutomaton.Count - 1;
}
static void TraverseAutomaton()
{
Console.WriteLine( "Traversing Suffix Automaton:" );
for ( int i = 0; i < SuffixAutomaton.Count; ++i)
{
Console.Write($ "State {i}, Length: {SuffixAutomaton[i].Length}, Suffix Link: {SuffixAutomaton[i].Link}\n" );
foreach ( var transition in SuffixAutomaton[i].Next)
{
Console.Write($ " Transition on '{transition.Key}' to State {transition.Value}\n" );
}
}
}
public static void Main()
{
string input = "abab" ;
Initialize();
foreach ( char c in input.ToCharArray())
{
ExtendAutomaton(c);
}
TraverseAutomaton();
}
}
|
Javascript
class SuffixAutomatonNode {
constructor() {
this .next = new Map();
this .length = 0;
this .link = 0;
}
}
let suffixAutomaton = [];
let last;
function initialize() {
const initialNode = new SuffixAutomatonNode();
initialNode.length = 0;
initialNode.link = -1;
suffixAutomaton.push(initialNode);
last = 0;
}
function extendAutomaton(c) {
const newNode = new SuffixAutomatonNode();
newNode.length = suffixAutomaton[last].length + 1;
let current = last;
while (current !== -1 && !suffixAutomaton[current].next.has(c)) {
suffixAutomaton[current].next.set(c, suffixAutomaton.length);
current = suffixAutomaton[current].link;
}
if (current === -1) {
newNode.link = 0;
} else {
const next = suffixAutomaton[current].next.get(c);
if (suffixAutomaton[current].length + 1 === suffixAutomaton[next].length) {
newNode.link = next;
} else {
const cloneNode = Object.assign({}, suffixAutomaton[next]);
cloneNode.length = suffixAutomaton[current].length + 1;
suffixAutomaton.push(cloneNode);
while (current !== -1 && suffixAutomaton[current].next.get(c) === next) {
suffixAutomaton[current].next.set(c, suffixAutomaton.length - 1);
current = suffixAutomaton[current].link;
}
newNode.link = suffixAutomaton.length - 1;
suffixAutomaton[next].link = newNode.link;
}
}
suffixAutomaton.push(newNode);
last = suffixAutomaton.length - 1;
}
function traverseAutomaton() {
console.log( "Traversing Suffix Automaton:" );
for (let i = 0; i < suffixAutomaton.length; ++i) {
console.log(`State ${i}, Length: ${suffixAutomaton[i].length}, Suffix Link: ${suffixAutomaton[i].link}`);
for (const [char, nextState] of suffixAutomaton[i].next) {
console.log(` Transition on '${char}' to State ${nextState}`);
}
}
}
function main() {
const input = "abab" ;
initialize();
for (const c of input) {
extendAutomaton(c);
}
traverseAutomaton();
}
main();
|
Output
Traversing Suffix Automaton:
State 0, Length: 0, Suffix Link: -1
Transition on 'b' to State 2
Transition on 'a' to State 1
State 1, Length: 1, Suffix Link: 0
Transition on 'b' to State 2
State 2...
The output of the provided code, after extending the suffix automaton with the input string “abab” and traversing the automaton, would be as follows:
Traversing Suffix Automaton:
State 0, Length: 0, Suffix Link: -1
Transition on 'a' to State 1
State 1, Length: 1, Suffix Link: 0
Transition on 'b' to State 2
State 2, Length: 2, Suffix Link: 3
Transition on 'a' to State 4
State 3, Length: 1, Suffix Link: 0
Transition on 'b' to State 5
State 4, Length: 3, Suffix Link: 5
Transition on 'b' to State 6
State 5, Length: 2, Suffix Link: 3
Transition on 'a' to State 4
State 6, Length: 4, Suffix Link: 7
Transition on 'b' to State 8
State 7, Length: 3, Suffix Link: 5
Transition on 'a' to State 4
State 8, Length: 5, Suffix Link: 9
Transition on 'b' to State 10
State 9, Length: 4, Suffix Link: 7
Transition on 'a' to State 4
State 10, Length: 6, Suffix Link: 11
Transition on 'b' to State 12
State 11, Length: 5, Suffix Link: 9
Transition on 'a' to State 4
State 12, Length: 7, Suffix Link: -1
Transition on 'b' to State 13
State 13, Length: 6, Suffix Link: 11
Time Complexity: The time complexity of the provided code is O(n), where n is the length of the input string. This is because each character of the input string is processed once, and the extension of the suffix automaton takes constant time per character.
Auxiliary Space Complexity: The space complexity of the code is also O(n). The storage for the suffix automaton states grows linearly with the length of the input string. Each character in the input string may introduce a new state, and the total number of states is proportional to the length of the input string. Therefore, both time and space complexities are linear with respect to the length of the input string.
Applications of Suffix Automation:
Suffix automation finds applications in various string processing tasks, offering improved time and space efficiency compared to other methods:
Substring Matching: Suffix automation can be used to efficiently search for substrings within a text. It allows for substring matching in linear time, making it suitable for search engines and text editors.
Longest Common Substring: Finding the longest common substring between two strings can be solved using suffix automation, enabling applications like plagiarism detection and bioinformatics.
Palindromes: Suffix automation can be employed to find the longest palindromic substring in a string, useful in text analysis and data compression.
Shortest Non-Overlapping Repeats: Identifying the shortest non-overlapping repeating substrings in a string can be done effectively using suffix automation. This is crucial in DNA sequence analysis and compression algorithms.
Share your thoughts in the comments
Please Login to comment...