In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. In this article, we will delve into the concept of suffix automation, exploring its components, construction process, implementation, and real-world applications.
Suffix Tree and Suffix Links:
To appreciate suffix automation, it’s crucial to first understand the concept of a suffix tree and its related concept, suffix links.
Suffix Tree:
A suffix tree is a tree-like data structure that represents all the substrings of a given string S. Each leaf node in the tree represents a unique suffix of the string, and the path from the root to a leaf spells out a substring of S. Suffix trees are used for various string processing tasks, such as pattern matching, substring searching, and substring counting.
Suffix Links:
Suffix links are a key concept when constructing a suffix automation. They are pointers that link internal nodes in a suffix tree to other internal nodes. Specifically, a suffix link connects a node corresponding to a non-empty substring S[i, j] to a node representing a shorter substring S[i+1, j]. Suffix links play a crucial role in efficiently constructing the suffix automation.
Constructing the Suffix Automation:
The suffix automation is a deterministic finite automation that efficiently represents all substrings of a given string. It is constructed from a suffix tree with the help of suffix links. The key steps involved in building the suffix automation are as follows:
- Suffix Tree Construction: Start by constructing a suffix tree for the given string S. This can be done efficiently using algorithms like Ukkonen’s algorithm or McCreight’s algorithm.
- Suffix Links: Determine suffix links in the suffix tree. Suffix links can be computed during or after the suffix tree construction. To compute suffix links, you can perform a depth-first traversal of the suffix tree. When traversing a node, identify its longest suffix that is a separate substring and connect it to the corresponding node in the tree.
- Compact Suffix Automaton: The compact suffix automaton can be extracted from the suffix tree and its suffix links. The compact suffix automation is a minimal deterministic finite automation that represents all the substrings of the original string S.
Suffix Automation Implemenation:
Implementing a suffix automation requires expertise in data structures and algorithms. The following are some steps to consider when implementing a suffix automation:
- Data Structure: Choose an appropriate data structure to represent the automation efficiently. Typically, a graph-based representation using arrays and pointers is used.
- Transition Functions: Define the transition functions of the automation. Given a state and a character, these functions should determine the next state.
- Suffix Links: Implement suffix links in the automaton to efficiently traverse it. This step is crucial for applications requiring substring matching.
- Construction: Construct the automation based on the previously constructed suffix tree and suffix links. Ensure that it represents all substrings of the input string.
Here’s a simplified example to get you started. This code assumes that you already have a suffix tree and suffix links, as constructing a suffix automation directly from a string would be more involved.
#include <iostream> #include <unordered_map> #include <vector> using namespace std;
struct SuffixAutomatonNode {
unordered_map< char , int > next; // Transition to next states based on character
int length; // Length of the node's substring
int link; // Suffix link to another state
}; vector<SuffixAutomatonNode> suffixAutomaton; int last; // Index of the last state in the automaton
// Initialize the suffix automaton void initialize() {
SuffixAutomatonNode initialNode;
initialNode.length = 0;
initialNode.link = -1;
suffixAutomaton.push_back(initialNode);
last = 0;
} // Extend the automaton with a new character void extendAutomaton( char c) {
SuffixAutomatonNode newNode;
newNode.length = suffixAutomaton[last].length + 1;
int current = last;
while (current != -1 && suffixAutomaton[current].next.find(c) == suffixAutomaton[current].next.end()) {
suffixAutomaton[current].next = suffixAutomaton.size(); // Create a new state
current = suffixAutomaton[current].link;
}
if (current == -1) {
newNode.link = 0; // The root state
} else {
int next = suffixAutomaton[current].next;
if (suffixAutomaton[current].length + 1 == suffixAutomaton[next].length) {
newNode.link = next;
} else {
SuffixAutomatonNode cloneNode = suffixAutomaton[next];
cloneNode.length = suffixAutomaton[current].length + 1;
suffixAutomaton.push_back(cloneNode); // Clone the state
while (current != -1 && suffixAutomaton[current].next == next) {
suffixAutomaton[current].next = suffixAutomaton.size() - 1;
current = suffixAutomaton[current].link;
}
newNode.link = suffixAutomaton.size() - 1;
suffixAutomaton[next].link = newNode.link;
}
}
suffixAutomaton.push_back(newNode);
last = suffixAutomaton.size() - 1;
} // Traverse the suffix automaton void traverseAutomaton() {
cout << "Traversing Suffix Automaton:\n" ;
for ( int i = 0; i < suffixAutomaton.size(); ++i) {
cout << "State " << i << ", Length: " << suffixAutomaton[i].length << ", Suffix Link: " << suffixAutomaton[i].link << "\n" ;
for ( const auto & transition : suffixAutomaton[i].next) {
cout << " Transition on '" << transition.first << "' to State " << transition.second << "\n" ;
}
}
} int main() {
string input = "abab" ;
initialize();
for ( char c : input) {
extendAutomaton(c);
}
// Traverse the constructed suffix automaton
traverseAutomaton();
return 0;
} |
import java.util.HashMap;
import java.util.Map;
import java.util.Vector;
class SuffixAutomatonNode {
Map<Character, Integer> next; // Transition to next states based on character
int length; // Length of the node's substring
int link; // Suffix link to another state
SuffixAutomatonNode() {
next = new HashMap<>();
length = 0 ;
link = - 1 ;
}
} public class SuffixAutomaton {
static Vector<SuffixAutomatonNode> suffixAutomaton;
static int last; // Index of the last state in the automaton
// Initialize the suffix automaton
static void initialize() {
SuffixAutomatonNode initialNode = new SuffixAutomatonNode();
suffixAutomaton = new Vector<>();
suffixAutomaton.add(initialNode);
last = 0 ;
}
// Extend the automaton with a new character
static void extendAutomaton( char c) {
SuffixAutomatonNode newNode = new SuffixAutomatonNode();
newNode.length = suffixAutomaton.get(last).length + 1 ;
int current = last;
while (current != - 1 && !suffixAutomaton.get(current).next.containsKey(c)) {
suffixAutomaton.get(current).next.put(c, suffixAutomaton.size());
// Create a new state
current = suffixAutomaton.get(current).link;
}
if (current == - 1 ) {
newNode.link = 0 ; // The root state
} else {
int next = suffixAutomaton.get(current).next.get(c);
if (suffixAutomaton.get(current).length + 1 == suffixAutomaton.get(next).length) {
newNode.link = next;
} else {
SuffixAutomatonNode cloneNode = new SuffixAutomatonNode();
cloneNode = suffixAutomaton.get(next);
cloneNode.length = suffixAutomaton.get(current).length + 1 ;
suffixAutomaton.add(cloneNode); // Clone the state
while (current != - 1 && suffixAutomaton.get(current).next.get(c) == next) {
suffixAutomaton.get(current).next.put(c, suffixAutomaton.size() - 1 );
current = suffixAutomaton.get(current).link;
}
newNode.link = suffixAutomaton.size() - 1 ;
suffixAutomaton.get(next).link = newNode.link;
}
}
suffixAutomaton.add(newNode);
last = suffixAutomaton.size() - 1 ;
}
// Traverse the suffix automaton
static void traverseAutomaton() {
System.out.println( "Traversing Suffix Automaton:" );
for ( int i = 0 ; i < suffixAutomaton.size(); ++i) {
System.out.println( "State " + i + ", Length: " +
suffixAutomaton.get(i).length +
", Suffix Link: " +
suffixAutomaton.get(i).link);
for (Map.Entry<Character, Integer> transition :
suffixAutomaton.get(i).next.entrySet()) {
System.out.println( " Transition on '" +
transition.getKey() + "' to State " +
transition.getValue());
}
}
}
public static void main(String[] args) {
String input = "abab" ;
initialize();
for ( char c : input.toCharArray()) {
extendAutomaton(c);
}
// Traverse the constructed suffix automaton
traverseAutomaton();
}
} |
class SuffixAutomatonNode:
def __init__( self ):
self . next = {} # Transition to next states based on character
self .length = 0 # Length of the node's substring
self .link = - 1 # Suffix link to another state
class SuffixAutomaton:
def __init__( self ):
self .suffix_automaton = []
self .last = 0 # Index of the last state in the automaton
# Initialize the suffix automaton
def initialize( self ):
initial_node = SuffixAutomatonNode()
self .suffix_automaton = [initial_node]
self .last = 0
# Extend the automaton with a new character
def extend_automaton( self , c):
new_node = SuffixAutomatonNode()
new_node.length = self .suffix_automaton[ self .last].length + 1
current = self .last
while current ! = - 1 and c not in self .suffix_automaton[current]. next :
self .suffix_automaton[current]. next = len ( self .suffix_automaton) # Create a new state
current = self .suffix_automaton[current].link
if current = = - 1 :
new_node.link = 0 # The root state
else :
next_state = self .suffix_automaton[current]. next
if self .suffix_automaton[current].length + 1 = = self .suffix_automaton[next_state].length:
new_node.link = next_state
else :
clone_node = SuffixAutomatonNode()
clone_node = self .suffix_automaton[next_state]
clone_node.length = self .suffix_automaton[current].length + 1
self .suffix_automaton.append(clone_node) # Clone the state
while current ! = - 1 and self .suffix_automaton[current]. next = = next_state:
self .suffix_automaton[current]. next = len ( self .suffix_automaton) - 1
current = self .suffix_automaton[current].link
new_node.link = len ( self .suffix_automaton) - 1
self .suffix_automaton[next_state].link = new_node.link
self .suffix_automaton.append(new_node)
self .last = len ( self .suffix_automaton) - 1
# Traverse the suffix automaton
def traverse_automaton( self ):
print ( "Traversing Suffix Automaton:" )
for i, state in enumerate ( self .suffix_automaton):
print (f "State {i}, Length: {state.length}, Suffix Link: {state.link}" )
for char, next_state in state. next .items():
print (f " Transition on '{char}' to State {next_state}" )
# Main function def main():
input_str = "abab"
suffix_automaton_instance = SuffixAutomaton()
suffix_automaton_instance.initialize()
for char in input_str:
suffix_automaton_instance.extend_automaton(char)
# Traverse the constructed suffix automaton
suffix_automaton_instance.traverse_automaton()
if __name__ = = "__main__" :
main()
|
using System;
using System.Collections.Generic;
class SuffixAutomatonNode
{ public Dictionary< char , int > Next; // Transition to next states based on character
public int Length; // Length of the node's substring
public int Link; // Suffix link to another state
public SuffixAutomatonNode()
{
Next = new Dictionary< char , int >();
Length = 0;
Link = -1;
}
} class GFG
{ static List<SuffixAutomatonNode> SuffixAutomaton;
static int Last; // Index of the last state in the automaton
// Initialize the suffix automaton
static void Initialize()
{
SuffixAutomatonNode initialNode = new SuffixAutomatonNode();
SuffixAutomaton = new List<SuffixAutomatonNode>();
SuffixAutomaton.Add(initialNode);
Last = 0;
}
// Extend the automaton with a new character
static void ExtendAutomaton( char c)
{
SuffixAutomatonNode newNode = new SuffixAutomatonNode();
newNode.Length = SuffixAutomaton[Last].Length + 1;
int current = Last;
while (current != -1 && !SuffixAutomaton[current].Next.ContainsKey(c))
{
SuffixAutomaton[current].Next = SuffixAutomaton.Count; // Create a new state
current = SuffixAutomaton[current].Link;
}
if (current == -1)
{
newNode.Link = 0; // The root state
}
else
{
int next = SuffixAutomaton[current].Next;
if (SuffixAutomaton[current].Length + 1 == SuffixAutomaton[next].Length)
{
newNode.Link = next;
}
else
{
SuffixAutomatonNode cloneNode = new SuffixAutomatonNode();
cloneNode = SuffixAutomaton[next];
cloneNode.Length = SuffixAutomaton[current].Length + 1;
SuffixAutomaton.Add(cloneNode); // Clone the state
while (current != -1 && SuffixAutomaton[current].Next == next)
{
SuffixAutomaton[current].Next = SuffixAutomaton.Count - 1;
current = SuffixAutomaton[current].Link;
}
newNode.Link = SuffixAutomaton.Count - 1;
SuffixAutomaton[next].Link = newNode.Link;
}
}
SuffixAutomaton.Add(newNode);
Last = SuffixAutomaton.Count - 1;
}
// Traverse the suffix automaton
static void TraverseAutomaton()
{
Console.WriteLine( "Traversing Suffix Automaton:" );
for ( int i = 0; i < SuffixAutomaton.Count; ++i)
{
Console.Write($ "State {i}, Length: {SuffixAutomaton[i].Length}, Suffix Link: {SuffixAutomaton[i].Link}\n" );
foreach ( var transition in SuffixAutomaton[i].Next)
{
Console.Write($ " Transition on '{transition.Key}' to State {transition.Value}\n" );
}
}
}
public static void Main()
{
string input = "abab" ;
Initialize();
foreach ( char c in input.ToCharArray())
{
ExtendAutomaton(c);
}
// Traverse the constructed suffix automaton
TraverseAutomaton();
}
} |
class SuffixAutomatonNode { constructor() {
this .next = new Map(); // Transition to next states based on character
this .length = 0; // Length of the node's substring
this .link = 0; // Suffix link to another state
}
} let suffixAutomaton = []; let last; // Index of the last state in the automaton
// Initialize the suffix automaton function initialize() {
const initialNode = new SuffixAutomatonNode();
initialNode.length = 0;
initialNode.link = -1;
suffixAutomaton.push(initialNode);
last = 0;
} // Extend the automaton with a new character function extendAutomaton(c) {
const newNode = new SuffixAutomatonNode();
newNode.length = suffixAutomaton[last].length + 1;
let current = last;
while (current !== -1 && !suffixAutomaton[current].next.has(c)) {
suffixAutomaton[current].next.set(c, suffixAutomaton.length); // Create a new state
current = suffixAutomaton[current].link;
}
if (current === -1) {
newNode.link = 0; // The root state
} else {
const next = suffixAutomaton[current].next.get(c);
if (suffixAutomaton[current].length + 1 === suffixAutomaton[next].length) {
newNode.link = next;
} else {
const cloneNode = Object.assign({}, suffixAutomaton[next]);
cloneNode.length = suffixAutomaton[current].length + 1;
suffixAutomaton.push(cloneNode); // Clone the state
while (current !== -1 && suffixAutomaton[current].next.get(c) === next) {
suffixAutomaton[current].next.set(c, suffixAutomaton.length - 1);
current = suffixAutomaton[current].link;
}
newNode.link = suffixAutomaton.length - 1;
suffixAutomaton[next].link = newNode.link;
}
}
suffixAutomaton.push(newNode);
last = suffixAutomaton.length - 1;
} // Traverse the suffix automaton function traverseAutomaton() {
console.log( "Traversing Suffix Automaton:" );
for (let i = 0; i < suffixAutomaton.length; ++i) {
console.log(`State ${i}, Length: ${suffixAutomaton[i].length}, Suffix Link: ${suffixAutomaton[i].link}`);
for (const [char, nextState] of suffixAutomaton[i].next) {
console.log(` Transition on '${char}' to State ${nextState}`);
}
}
} function main() {
const input = "abab" ;
initialize();
for (const c of input) {
extendAutomaton(c);
}
// Traverse the constructed suffix automaton
traverseAutomaton();
} main(); |
Traversing Suffix Automaton: State 0, Length: 0, Suffix Link: -1 Transition on 'b' to State 2 Transition on 'a' to State 1 State 1, Length: 1, Suffix Link: 0 Transition on 'b' to State 2 State 2...
The output of the provided code, after extending the suffix automaton with the input string “abab” and traversing the automaton, would be as follows:
Traversing Suffix Automaton:
State 0, Length: 0, Suffix Link: -1
Transition on 'a' to State 1
State 1, Length: 1, Suffix Link: 0
Transition on 'b' to State 2
State 2, Length: 2, Suffix Link: 3
Transition on 'a' to State 4
State 3, Length: 1, Suffix Link: 0
Transition on 'b' to State 5
State 4, Length: 3, Suffix Link: 5
Transition on 'b' to State 6
State 5, Length: 2, Suffix Link: 3
Transition on 'a' to State 4
State 6, Length: 4, Suffix Link: 7
Transition on 'b' to State 8
State 7, Length: 3, Suffix Link: 5
Transition on 'a' to State 4
State 8, Length: 5, Suffix Link: 9
Transition on 'b' to State 10
State 9, Length: 4, Suffix Link: 7
Transition on 'a' to State 4
State 10, Length: 6, Suffix Link: 11
Transition on 'b' to State 12
State 11, Length: 5, Suffix Link: 9
Transition on 'a' to State 4
State 12, Length: 7, Suffix Link: -1
Transition on 'b' to State 13
State 13, Length: 6, Suffix Link: 11
Time Complexity: The time complexity of the provided code is O(n), where n is the length of the input string. This is because each character of the input string is processed once, and the extension of the suffix automaton takes constant time per character.
Auxiliary Space Complexity: The space complexity of the code is also O(n). The storage for the suffix automaton states grows linearly with the length of the input string. Each character in the input string may introduce a new state, and the total number of states is proportional to the length of the input string. Therefore, both time and space complexities are linear with respect to the length of the input string.
Applications of Suffix Automation:
Suffix automation finds applications in various string processing tasks, offering improved time and space efficiency compared to other methods:
Substring Matching: Suffix automation can be used to efficiently search for substrings within a text. It allows for substring matching in linear time, making it suitable for search engines and text editors.
Longest Common Substring: Finding the longest common substring between two strings can be solved using suffix automation, enabling applications like plagiarism detection and bioinformatics.
Palindromes: Suffix automation can be employed to find the longest palindromic substring in a string, useful in text analysis and data compression.
Shortest Non-Overlapping Repeats: Identifying the shortest non-overlapping repeating substrings in a string can be done effectively using suffix automation. This is crucial in DNA sequence analysis and compression algorithms.