Open In App

Construct a String from another String using Suffix Trie

Last Updated : 23 Nov, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

A suffix tree is a data structure based on trie compression that stores all the suffixes of a given string. Apart from detecting patterns in strings, it has a range of applications in fields like bioinformatics, string algorithms, and data compression.

Features of Suffix Trie:

  • A Suffix Trie, commonly referred to as a suffix tree, is a data structure that resembles a tree and is used to store and look for patterns in strings. 
  • Each route in a suffix trie represents a particular suffix, and it keeps all the suffixes of a given string as pathways in a tree.
  • We commence with a blank tree and add each suffix of the string to a tree to generate a suffix trie for a text sequence. 
  • The empty string would serve as the root node of the output tree, and then each leaf node will symbolize a suffix of the input string. 
  • A frequent substring that appears in at least two of the suffixes is represented by each internal node.
  • The ability to quickly find substrings inside a text is one of the key benefits of employing a suffix trie.
  • By moving down the tree along the route that the pattern specifies, we may search for a pattern in the suffix trie given a pattern. 
  • We shall arrive at a leaf node that represents the suffix that begins with the pattern if the pattern is found in the string.

Examples:

Input: str1 = “programming” str2 = “gaming”
Output: [(3, 3), (5, 6), (8, 10)]

 

Explanation: In this solution, we construct the suffix trie for the string “str1”. Then, for each substring of “str2”, we check if it exists in the suffix trie of “str1”. If it exists, we record the starting and ending indices of the substring in “str1” that form the given substring of “str2”.

In the example given, the first substring of “str2” is “g”. We search for this substring in the suffix trie of “str1” and find it at index 3. Therefore, we record the starting and ending indices of this substring in “str1” as (3, 3). Similarly, we find that the substring “am” in “str2” can be constructed from the suffix trie of “str1” using indices (5, 6) in “str1”. Finally, we find that the substring “ing” in “str2” can be constructed from the suffix trie of “str1” using indices (8, 10) in “str1”.

Therefore, the output of the program is [(3, 3), (5, 6), (8, 10)], which represents the starting and ending indices of each substring of “str1” that can be used to construct the corresponding substring in “str2”.

Input: str1 = “banana” str2 = “ana”
Output: [(1, 3)]

 

Explanation: A suffix trie is a data structure that stores all the suffixes of a given string in a tree-like structure. To construct str2 from str1 using a suffix trie, we first build a suffix trie for str1. Then, we search for str2 in the suffix trie by traversing down the tree, following the edges labeled with the characters of str2.

To construct “ana” from “banana”, we start at the root of the suffix trie and follow the edges labeled “a”, “n”, and “a”, respectively, until we reach the end of the string. The indices of the characters we traverse are (1, 3), which correspond to the substring “ana” in str1.

Approach: This can be solved with the following idea:

Step 1: Create a Suffix Trie for the Original String

  • The first step is to construct a trie data structure that represents all the suffixes of the original string. This data structure is called a suffix trie and can be constructed using any standard algorithm.

Step 2: Identify Suffixes Beginning with the Initial Substring

  • After constructing the suffix trie, the next step is to locate all the suffixes that start with the initial substring of interest. This can be achieved by traversing the trie from the root to the leaf node that corresponds to the initial substring. By following the edges that match the characters of the initial substring, we can identify all the suffixes that begin with it.

Step 3: Determine the Longest Common Prefix (LCP) of the Suffixes

  • Once we have identified all the suffixes that begin with the initial substring, we need to determine their LCP. To accomplish this, we must identify the lowest common ancestor of the leaf nodes that correspond to the suffixes. The LCA represents the longest common prefix of the suffixes.

Step 4: Add the LCP to the Output String

  • After determining the LCP of the suffixes, we can add it to the output string.

Step 5: Repeat for Additional Substrings

  • To find the LCP for every additional substring, we repeat steps 2-4, beginning at the end of the previous substring. We identify all the suffixes that begin with the additional substring, determine their LCP, and add it to the output string.

Below is the code for the above approach:

C++




// C++ implementation of the above approach
#include <iostream>
#include <vector>
using namespace std;
 
// Implementing Trie using TrieNode class
class TrieNode {
public:
    vector<TrieNode*> children;
    bool isEndOfWord;
 
    TrieNode() {
        children = vector<TrieNode*>(26, nullptr);
        isEndOfWord = false;
    }
};
 
// Trie data structure class
class Trie {
private:
    TrieNode* root;
 
    TrieNode* getNode() {
        return new TrieNode();
    }
 
    int charToIndex(char ch) {
        return ch - 'a';
    }
 
public:
    Trie() {
        root = getNode();
    }
 
    void insert(string key) {
        TrieNode* word = root;
        int length = key.length();
        for (int level = 0; level < length; level++) {
            int index = charToIndex(key[level]);
            if (!word->children[index]) {
                word->children[index] = getNode();
            }
            word = word->children[index];
        }
        word->isEndOfWord = true;
    }
 
    bool search(string key) {
        TrieNode* word = root;
        int length = key.length();
        int level = 0;
        while (level < length) {
            int index = charToIndex(key[level]);
            if (!word->children[index]) {
                return false;
            }
            word = word->children[index];
            level++;
        }
        if (level == length) {
            return true;
        } else {
            return false;
        }
    }
 
    static vector<pair<int, int>> buildViaSubstrings(string P, string Q) {
        if (P.length() == 1) {
            for (int i = 0; i < Q.length(); i++) {
                if (Q[i] != P[0]) {
                    return {};
                }
            }
            vector<pair<int, int>> substrings(Q.length(), make_pair(0, 0));
            return substrings;
        } else {
            Trie x;
            for (int i = 0; i < P.length(); i++) {
                x.insert(P.substr(i));
            }
            int startPos = 0;
            vector<pair<int, int>> substrings;
            bool y = true;
            int k = 1;
            while (k <= Q.length()) {
                y = x.search(Q.substr(startPos, k - startPos));
                if (!y) {
                    if (k == startPos + 1) {
                        return {};
                    } else {
                        string sub = Q.substr(startPos, k - 1 - startPos);
                        int lt = sub.length();
                        int m = P.find(sub);
                        substrings.push_back(make_pair(m, m + lt - 1));
                        startPos = k - 1;
                        k = k - 1;
                        y = true;
                    }
                } else if (y && k == Q.length()) {
                    string sub = Q.substr(startPos);
                    int lt = sub.length();
                    int m = P.find(sub);
                    substrings.push_back(make_pair(m, m + lt - 1));
                }
                k++;
            }
            if (y && substrings.empty()) {
                return {make_pair(P.find(Q), Q.length() - 1)};
            } else {
                return substrings;
            }
        }
    }
};
 
int main() {
    string str1 = "ssrtssr";
    string str2 = "rsstsr";
 
    vector<pair<int, int>> ans = Trie::buildViaSubstrings(str1, str2);
    for (auto p : ans) {
        cout << "(" << p.first << ", " << p.second << ") ";
    }
    cout << endl;
 
    return 0;
}
 
// This code is contributed by Tapesh(tapeshdua420)


Java




import java.util.ArrayList;
import java.util.List;
 
class TrieNode {
    TrieNode[] children;
    boolean isEndOfWord;
 
    TrieNode() {
        children = new TrieNode[26];
        isEndOfWord = false;
    }
}
 
class Trie {
    private TrieNode root;
 
    Trie() {
        root = new TrieNode();
    }
 
    void insert(String key) {
        TrieNode node = root;
        int length = key.length();
        for (int level = 0; level < length; level++) {
            int index = key.charAt(level) - 'a';
            if (node.children[index] == null) {
                node.children[index] = new TrieNode();
            }
            node = node.children[index];
        }
        node.isEndOfWord = true;
    }
 
    boolean search(String key) {
        TrieNode node = root;
        int length = key.length();
        int level = 0;
        while (level < length) {
            int index = key.charAt(level) - 'a';
            if (node.children[index] == null) {
                return false;
            }
            node = node.children[index];
            level++;
        }
        return (level == length);
    }
 
    static List<Pair<Integer, Integer>> buildViaSubstrings(String P, String Q) {
        if (P.length() == 1) {
            for (int i = 0; i < Q.length(); i++) {
                if (Q.charAt(i) != P.charAt(0)) {
                    return new ArrayList<>();
                }
            }
            List<Pair<Integer, Integer>> substrings = new ArrayList<>();
            for (int i = 0; i < Q.length(); i++) {
                substrings.add(new Pair<>(0, i));
            }
            return substrings;
        } else {
            Trie x = new Trie();
            for (int i = 0; i < P.length(); i++) {
                x.insert(P.substring(i));
            }
            int startPos = 0;
            List<Pair<Integer, Integer>> substrings = new ArrayList<>();
            boolean y = true;
            int k = 1;
            while (k <= Q.length()) {
                y = x.search(Q.substring(startPos, k));
                if (!y) {
                    if (k == startPos + 1) {
                        return new ArrayList<>();
                    } else {
                        String sub = Q.substring(startPos, k - 1);
                        int lt = sub.length();
                        int m = P.indexOf(sub);
                        substrings.add(new Pair<>(m, m + lt - 1));
                        startPos = k - 1;
                        k = k - 1;
                        y = true;
                    }
                } else if (y && k == Q.length()) {
                    String sub = Q.substring(startPos);
                    int lt = sub.length();
                    int m = P.indexOf(sub);
                    substrings.add(new Pair<>(m, m + lt - 1));
                }
                k++;
            }
            if (y && substrings.isEmpty()) {
                int m = P.indexOf(Q);
                substrings.add(new Pair<>(m, m + Q.length() - 1));
                return substrings;
            } else {
                return substrings;
            }
        }
    }
}
 
class Pair<A, B> {
    A first;
    B second;
 
    Pair(A first, B second) {
        this.first = first;
        this.second = second;
    }
}
 
public class Main {
    public static void main(String[] args) {
        String str1 = "ssrtssr";
        String str2 = "rsstsr";
 
        List<Pair<Integer, Integer>> ans = Trie.buildViaSubstrings(str1, str2);
        for (Pair<Integer, Integer> p : ans) {
            System.out.print("(" + p.first + ", " + p.second + ") ");
        }
        System.out.println();
    }
}


Python3




# Implementing Trie using Trie and
# TrieNode classes
""" The Trie_Node class is defined with a
__init__ method that creates a list of None
 values with length 26 to represent children nodes
 and a Boolean variable isEndOfWord to mark the
 end of a word in the trie."""
 
 
class Trie_Node:
 
    # Trie node class
    def __init__(self):
        self.children = [None]*26
 
        # Property to represent end
        # of a word in trie
        self.isEndOfWord = False
 
 
"""The Trie class is defined with a __init__ method
that creates a root node using the getNode method
and a _charToIndex private helper method to convert
a character to its index in the children list."""
 
 
class Trie(Trie_Node):
 
    # Trie data structure class
    def __init__(self):
        self.root = self.getNode()
 
    def getNode(self):
 
      # Returns new trie node with
      # Null values
        return Trie_Node()
 
    def _charToIndex(self, ch):
 
        # Private helper function
        return ord(ch)-ord('a')
 
    """The insert method is defined to insert a
    new key (a string) into the trie. It
    iterates over each character of
    the key and checks if the character is
    already present in the trie. If it's
    not present, it creates a new
    node and adds it to the children list
    of the current node. The method marks
    the last node as the end of the word."""
 
    def insert(self, key):
 
        # When word is already
        # present in trie
        word = self.root
        length = len(key)
        for level in range(length):
            index = self._charToIndex(key[level])
 
            # If character is not present
            # in trie
            if not word.children[index]:
                word.children[index] = self.getNode()
            word = word.children[index]
        word.isEndOfWord = True
 
    """The search method is defined to search for a
    key (a string) in the trie. It iterates over
    each character of the key and checks if the
    character is present in the trie. If it's
    not present, the method returns False.
    if the method reaches the end of the key
    and the last node is marked as the end
    of the word, the method returns True."""
 
    def search(self, key):
 
        # Search substring in the trie
        word = self.root
        length = len(key)
        level = 0
        while level < length:
            index = self._charToIndex(key[level])
            if not word.children[index]:
                return False
            word = word.children[index]
            level += 1
 
        if level == length:
            return True
        else:
            return False
 
    """The build_via_substrings method is
    defined to build a suffix trie for a given
    input string P and search for all
    substrings of another input string Q
    in the trie."""
    def build_via_substrings(P, Q):
 
        # handling when length of S is 1
        if len(P) == 1:
            for i in range(len(Q)):
                if Q[i] != P:
                    return False
            return [(0, 0)]*len(Q)
        else:
 
            # creating suffix trie
            x = Trie()
            for i in range(len(P)):
                x.insert(P[i:])
            start_pos = 0
            substrings = []
            y = True
            k = 1
 
            # Search substrings in trie
            while k <= len(Q):
                y = x.search(Q[start_pos:k])
                if y == False:
 
                    # Unsuccessful search
                    # for a single lettered
                    # substring.
                    if k == start_pos + 1:
                        return False
 
                    elif k != start_pos + 1:
 
                        # When search fails
                        # for a substring
                        # greater than
                        # length = 1
                        sub = Q[start_pos:k-1]
                        lt = len(sub)
                        m = P.find(sub)
                        substrings.append((m, m + lt-1))
                        start_pos = k-1
                        k = k-1
                        y = True
                elif y == True and k == len(Q):
 
                    # We check whether we
                    # have reached the
                    # last letter
                    sub = Q[start_pos:]
                    lt = len(sub)
                    m = P.find(sub)
                    substrings.append((m, m + lt-1))
                k = k + 1
            if y == True and substrings == []:
                return [(P.find(Q), len(Q)-1)]
            else:
                return substrings
 
 
# Driver code
if __name__ == "__main__":
    str1 = "ssrtssr"
    str2 = "rsstsr"
 
    # Function call
    ans = Trie.build_via_substrings(str1, str2)
    print(ans)


C#




// C# implementation for the above approach
using System;
using System.Collections.Generic;
 
// Implementing Trie using TrieNode class
class TrieNode {
    public TrieNode[] Children;
    public bool IsEndOfWord;
 
    public TrieNode()
    {
        Children = new TrieNode[26];
        IsEndOfWord = false;
    }
}
 
// Trie data structure class
class Trie {
    private TrieNode root;
 
    private TrieNode GetNode() { return new TrieNode(); }
 
    private int CharToIndex(char ch) { return ch - 'a'; }
 
    public Trie() { root = GetNode(); }
 
    public void Insert(string key)
    {
        TrieNode word = root;
        int length = key.Length;
        for (int level = 0; level < length; level++) {
            int index = CharToIndex(key[level]);
            if (word.Children[index] == null) {
                word.Children[index] = GetNode();
            }
            word = word.Children[index];
        }
        word.IsEndOfWord = true;
    }
 
    public bool Search(string key)
    {
        TrieNode word = root;
        int length = key.Length;
        int level = 0;
        while (level < length) {
            int index = CharToIndex(key[level]);
            if (word.Children[index] == null) {
                return false;
            }
            word = word.Children[index];
            level++;
        }
        return (level == length);
    }
 
    public static List<Tuple<int, int> >
    BuildViaSubstrings(string P, string Q)
    {
        if (P.Length == 1) {
            for (int i = 0; i < Q.Length; i++) {
                if (Q[i] != P[0]) {
                    return new List<Tuple<int, int> >();
                }
            }
            List<Tuple<int, int> > substrings
                = new List<Tuple<int, int> >();
            for (int i = 0; i < Q.Length; i++) {
                substrings.Add(new Tuple<int, int>(0, i));
            }
            return substrings;
        }
        else {
            Trie x = new Trie();
            for (int i = 0; i < P.Length; i++) {
                x.Insert(P.Substring(i));
            }
            int startPos = 0;
            List<Tuple<int, int> > substrings
                = new List<Tuple<int, int> >();
            bool y = true;
            int k = 1;
            while (k <= Q.Length) {
                y = x.Search(
                    Q.Substring(startPos, k - startPos));
                if (!y) {
                    if (k == startPos + 1) {
                        return new List<Tuple<int, int> >();
                    }
                    else {
                        string sub = Q.Substring(
                            startPos, k - 1 - startPos);
                        int lt = sub.Length;
                        int m = P.IndexOf(sub);
                        substrings.Add(new Tuple<int, int>(
                            m, m + lt - 1));
                        startPos = k - 1;
                        k = k - 1;
                        y = true;
                    }
                }
                else if (y && k == Q.Length) {
                    string sub = Q.Substring(startPos);
                    int lt = sub.Length;
                    int m = P.IndexOf(sub);
                    substrings.Add(
                        new Tuple<int, int>(m, m + lt - 1));
                }
                k++;
            }
            if (y && substrings.Count == 0) {
                return new List<Tuple<int, int> >{
                    new Tuple<int, int>(P.IndexOf(Q),
                                        Q.Length - 1)
                };
            }
            else {
                return substrings;
            }
        }
    }
}
 
class GFG {
    static void Main(string[] args)
    {
        string str1 = "ssrtssr";
        string str2 = "rsstsr";
 
        List<Tuple<int, int> > ans
            = Trie.BuildViaSubstrings(str1, str2);
        foreach(var p in ans)
        {
            Console.Write("(" + p.Item1 + ", " + p.Item2
                          + ") ");
        }
        Console.WriteLine();
    }
}


Javascript




// Implementing Trie using TrieNode class
class TrieNode {
    constructor() {
        // Initialize an array to store child TrieNodes for each character
        this.children = new Array(26).fill(null);
        this.isEndOfWord = false;
    }
}
 
// Trie data structure class
class Trie {
    constructor() {
        this.root = new TrieNode();
    }
 
    // Helper function to get a new TrieNode
    getNode() {
        return new TrieNode();
    }
 
    // Helper function to get the index of a character
    charToIndex(ch) {
        return ch.charCodeAt(0) - 'a'.charCodeAt(0);
    }
 
    // Insert a word into the Trie
    insert(key) {
        let word = this.root;
        const length = key.length;
        for (let level = 0; level < length; level++) {
            const index = this.charToIndex(key[level]);
            if (!word.children[index]) {
                word.children[index] = this.getNode();
            }
            word = word.children[index];
        }
        word.isEndOfWord = true;
    }
 
    // Search for a word in the Trie
    search(key) {
        let word = this.root;
        const length = key.length;
        let level = 0;
        while (level < length) {
            const index = this.charToIndex(key[level]);
            if (!word.children[index]) {
                return false;
            }
            word = word.children[index];
            level++;
        }
        return level === length;
    }
 
    // Build substrings of Q that can be formed using non-overlapping substrings of P
    static buildViaSubstrings(P, Q) {
        if (P.length === 1) {
            for (let i = 0; i < Q.length; i++) {
                if (Q[i] !== P[0]) {
                    return [];
                }
            }
            const substrings = Array(Q.length).fill().map((_, i) => [0, i]);
            return substrings;
        } else {
            const x = new Trie();
            for (let i = 0; i < P.length; i++) {
                x.insert(P.substr(i));
            }
            let startPos = 0;
            const substrings = [];
            let y = true;
            let k = 1;
            while (k <= Q.length) {
                y = x.search(Q.substr(startPos, k - startPos));
                if (!y) {
                    if (k === startPos + 1) {
                        return [];
                    } else {
                        const sub = Q.substr(startPos, k - 1 - startPos);
                        const lt = sub.length;
                        const m = P.indexOf(sub);
                        substrings.push([m, m + lt - 1]);
                        startPos = k - 1;
                        k = k - 1;
                        y = true;
                    }
                } else if (y && k === Q.length) {
                    const sub = Q.substr(startPos);
                    const lt = sub.length;
                    const m = P.indexOf(sub);
                    substrings.push([m, m + lt - 1]);
                }
                k++;
            }
            if (y && substrings.length === 0) {
                return [[P.indexOf(Q), Q.length - 1]];
            } else {
                return substrings;
            }
        }
    }
}
 
// Main function
function main() {
    const str1 = "ssrtssr";
    const str2 = "rsstsr";
 
    const ans = Trie.buildViaSubstrings(str1, str2);
    for (const p of ans) {
        console.log(`(${p[0]}, ${p[1]})`);
    }
}
 
// Run the main function
main();


Output

[(2, 2), (0, 1), (3, 4), (2, 2)]






Time Complexity: O(n2 + m)
Auxiliary Space: O(n*26)

Applications of Suffix Trie:

  • Suffix trie is used to find all occurrences of a pattern in a given text by searching for all substrings of the pattern in the text in pattern matching algorithms.
  • It is also used to assemble genome sequences from short DNA sequences by matching and aligning the short reads to the reference genome in bioinformatics.
  • Widely used to check whether a word is spelled correctly by searching for all possible substrings of the input word in spell-checking software.
  • It is preferably used to identify and optimize frequently used code patterns in compilers and code optimization tools.
  • Suffix trie is also used in natural language processing applications to properly match and categorize words and phrases based on their morphological and syntactical properties.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads