Persistent Trie | Set 1 (Introduction)

Last Updated : 04 Apr, 2023

Prerequisite:

Trie is one handy data structure that often comes into play when performing multiple string lookups. In this post, we will introduce the concept of Persistency in this data structure. Persistency simply means to retain the changes. But obviously, retaining the changes cause extra memory consumption and hence affect the Time Complexity.

Our aim is to apply persistency in Trie and also to ensure that it does not take more than the standard trie searching i.e. O(length_of_key). We will also analyze the extra space complexity that persistency causes over the standard Space Complexity of a Trie.

Let’s think in terms of versions i.e. for each change/insertion in our Trie we create a new version of it.
We will consider our initial version to be Version-0. Now, as we do any insertion in the trie we will create a new version for it and in similar fashion track the record for all versions.

But creating the whole trie every time for every version keeps doubling up memory and affects the Space Complexity very badly. So, this idea will easily run out of memory for a large number of versions.

Let’s exploit the fact that for each new insertion in the trie, exactly X (length_of_key) nodes will be visited/modified. So, our new version will only contain these X new nodes and rest trie nodes will be the same as the previous version. Therefore, it is quite clear that for each new version we only need to create these X new nodes whereas the rest of the trie nodes can be shared from the previous version.

Consider the below figure for better visualization:

Now, the Question arises: How to keep track of all the versions?
We only need to keep track the first root node for all the versions and this will serve the purpose to track all the newly created nodes in the different versions as the root node gives us the entry point for that particular version. For this purpose, we can maintain an array of pointers to the root node of the trie for all versions.

Let’s consider the below scenario and see how we can use Persistent Trie to solve it !
Given an array of strings and we need to determine if a string exists in some
range [l, r] in the array. To have an analogy, consider the array to be a
list of words in a dictionary at ith page(i is the index of the array) and
we need to determine whether a given word X exists in the page range [l, r]?

Below is the implementation for the above problem:-

C++

// C++ implementation of the approach
#include <bits/stdc++.h>
using namespace std;
 
// Distinct numbers of chars in key
const int sz = 26;
 
// Persistent Trie node structure
struct PersistentTrie {
 
    // Stores all children nodes, where ith children denotes
    // ith alphabetical character
    vector<PersistentTrie*> children;
 
    // Marks the ending of the key
    bool keyEnd = false;
 
    // Constructor 1
    PersistentTrie(bool keyEnd = false)
    {
        this->keyEnd = keyEnd;
    }
 
    // Constructor 2
    PersistentTrie(vector<PersistentTrie*>& children,
                   bool keyEnd = false)
    {
        this->children = children;
        this->keyEnd = keyEnd;
    }
 
    // detects existence of key in trie
    bool findKey(string& key, int len);
 
    // Inserts key into trie
    // returns new node after insertion
    PersistentTrie* insert(string& key, int len);
};
 
// Dummy PersistentTrie node
PersistentTrie* dummy;
 
// Initialize dummy for easy implementation
void init()
{
    dummy = new PersistentTrie(true);
 
    // All children of dummy as dummy
    vector<PersistentTrie*> children(sz, dummy);
    dummy->children = children;
}
 
// Inserts key into current trie
// returns newly created trie node after insertion
PersistentTrie* PersistentTrie::insert(string& key, int len)
{
 
    // If reached the end of key string
    if (len == key.length()) {
 
        // Create new trie node with current trie node
        // marked as keyEnd
        return new PersistentTrie((*this).children, true);
    }
 
    // Fetch current child nodes
    vector<PersistentTrie*> new_version_PersistentTrie
        = (*this).children;
 
    // Insert at key[len] child and
    // update the new child node
    PersistentTrie* tmpNode
        = new_version_PersistentTrie[key[len] - 'a'];
    new_version_PersistentTrie[key[len] - 'a']
        = tmpNode->insert(key, len + 1);
 
    // Return a new node with modified key[len] child node
    return new PersistentTrie(new_version_PersistentTrie);
}
 
// Returns the presence of key in current trie
bool PersistentTrie::findKey(string& key, int len)
{
    // If reached end of key
    if (key.length() == len)
 
        // Return if this is a keyEnd in trie
        return this->keyEnd;
 
    // If we cannot find key[len] child in trie
    // we say key doesn't exist in the trie
    if (this->children[key[len] - 'a'] == dummy)
        return false;
 
    // Recursively search the rest of
    // key length in children[key] trie
    return this->children[key[len] - 'a']->findKey(key,
                                                   len + 1);
}
 
// dfs traversal over the current trie
// prints all the keys present in the current trie
void printAllKeysInTrie(PersistentTrie* root, string& s)
{
    int flag = 0;
    for (int i = 0; i < sz; i++) {
        if (root->children[i] != dummy) {
            flag = 1;
            s.push_back('a' + i);
            printAllKeysInTrie(root->children[i], s);
            s.pop_back();
        }
    }
    if (flag == 0 and s.length() > 0)
        cout << s << endl;
}
 
// Driver code
int main(int argc, char const* argv[])
{
 
    // Initialize the PersistentTrie
    init();
 
    // Input keys
    vector<string> keys(
        { "goku", "gohan", "goten", "gogeta" });
 
    // Cache to store trie entry roots after each insertion
    PersistentTrie* root[keys.size()];
 
    // Marking first root as dummy
    root[0] = dummy;
 
    // Inserting all keys
    for (int i = 1; i <= keys.size(); i++) {
 
        // Caching new root for ith version of trie
        root[i] = root[i - 1]->insert(keys[i - 1], 0);
    }
 
    int idx = 3;
    cout << "All keys in trie after version - " << idx
         << endl;
    string key = "";
    printAllKeysInTrie(root[idx], key);
 
    string queryString = "goku";
    int l = 2, r = 3;
    cout << "range : "
         << "[" << l << ", " << r << "]" << endl;
    if (root[r]->findKey(queryString, 0)
        and !root[l - 1]->findKey(queryString, 0))
        cout << queryString << " - exists in above range"
             << endl;
    else
        cout << queryString
             << " - does not exist in above range" << endl;
 
    queryString = "goten";
    l = 2, r = 4;
    cout << "range : "
         << "[" << l << ", " << r << "]" << endl;
    if (root[r]->findKey(queryString, 0)
        and !root[l - 1]->findKey(queryString, 0))
        cout << queryString << " - exists in above range"
             << endl;
    else
        cout << queryString
             << " - does not exist in above range" << endl;
 
    return 0;
}

Java

// Java program for the above approach
import java.io.*;
import java.util.*;
 
// Persistent Trie node structure
class PersistentTrie {
 
    // Stores all children nodes, where
    // ith children denotes ith
    // alphabetical character
    PersistentTrie[] children;
 
    // Marks the ending of the key
    boolean keyEnd = false;
 
    // Constructor 1
    PersistentTrie(boolean keyEnd) { this.keyEnd = keyEnd; }
 
    // Constructor 2
    PersistentTrie(PersistentTrie[] children,
                   boolean keyEnd)
    {
        this.children = children;
        this.keyEnd = keyEnd;
    }
 
    // Detects existence of key in trie
    boolean findKey(String key, int len,
                    PersistentTrie dummy)
    {
 
        // If reached end of key
        if (key.length() == len)
 
            // Return if this is a keyEnd in trie
            return this.keyEnd;
 
        // If we cannot find key[len] child in trie
        // we say key doesn't exist in the trie
        if (this.children[key.charAt(len) - 'a'] == dummy)
            return false;
 
        // Recursively search the rest of
        // key length in children[key] trie
        return this.children[key.charAt(len) - 'a'].findKey(
            key, len + 1, dummy);
    }
 
    // Inserts key into trie
    // returns new node after insertion
    PersistentTrie insert(String key, int len)
    {
 
        // If reached the end of key string
        if (len == key.length()) {
 
            // Create new trie node with current trie node
            // marked as keyEnd
            return new PersistentTrie(this.children.clone(),
                                      true);
        }
 
        // Fetch current child nodes
        PersistentTrie[] new_version_PersistentTrie
            = this.children.clone();
 
        // Insert at key[len] child and
        // update the new child node
        PersistentTrie tmpNode
            = new_version_PersistentTrie[key.charAt(len)
                                         - 'a'];
        new_version_PersistentTrie[key.charAt(len) - 'a']
            = tmpNode.insert(key, len + 1);
 
        // Return a new node with modified key[len] child
        // node
        return new PersistentTrie(
            new_version_PersistentTrie, false);
    }
}
 
class GFG {
 
    static final int sz = 26;
 
    // Dummy PersistentTrie node
    static PersistentTrie dummy;
 
    // Initialize dummy for easy implementation
    static void init()
    {
        dummy = new PersistentTrie(false);
 
        // All children of dummy as dummy
        PersistentTrie[] children = new PersistentTrie[sz];
        for (int i = 0; i < sz; i++)
            children[i] = dummy;
 
        dummy.children = children;
    }
 
    // dfs traversal over the current trie
    // prints all the keys present in the current trie
    static void printAllKeysInTrie(PersistentTrie root,
                                   String s)
    {
        int flag = 0;
        for (int i = 0; i < sz; i++) {
            if (root.children[i] != dummy) {
                flag = 1;
                printAllKeysInTrie(root.children[i],
                                   s + ((char)('a' + i)));
            }
            if (root.children[i].keyEnd)
                System.out.println(s + (char)('a' + i));
        }
    }
 
    // Driver code
    public static void main(String[] args)
    {
 
        // Initialize the PersistentTrie
        init();
 
        // Input keys
        List<String> keys = Arrays.asList(new String[] {
            "goku", "gohan", "goten", "gogeta" });
 
        // Cache to store trie entry roots after each
        // insertion
        PersistentTrie[] root
            = new PersistentTrie[keys.size() + 1];
 
        // Marking first root as dummy
        root[0] = dummy;
 
        // Inserting all keys
        for (int i = 1; i <= keys.size(); i++) {
 
            // Caching new root for ith version of trie
            root[i]
                = root[i - 1].insert(keys.get(i - 1), 0);
        }
 
        int idx = 3;
        System.out.println("All keys in trie "
                           + "after version - " + idx);
        String key = "";
 
        printAllKeysInTrie(root[3], key);
 
        String queryString = "goku";
 
        int l = 2, r = 3;
        System.out.println("range : "
                           + "[" + l + ", " + r + "]");
 
        if (root[r].findKey(queryString, 0, dummy)
            && !root[l - 1].findKey(queryString, 0, dummy))
            System.out.println(
                queryString + " - exists in above range");
        else
            System.out.println(queryString
                               + " - does not exist in "
                               + "above range");
 
        queryString = "goten";
        l = 2;
        r = 4;
        System.out.println("range : "
                           + "[" + l + ", " + r + "]");
 
        if (root[r].findKey(queryString, 0, dummy)
            && !root[l - 1].findKey(queryString, 0, dummy))
            System.out.println(
                queryString + " - exists in above range");
        else
            System.out.println(
                queryString
                + " - does not exist in above range");
    }
}
 
// This code is contributed by jithin

C#

// C# program for the above approach
using System;
using System.Collections.Generic;
 
// Persistent Trie node structure
public class PersistentTrie {
 
    // Stores all children nodes, where
    // ith children denotes ith
    // alphabetical character
    public PersistentTrie[] Children;
 
    // Marks the ending of the key
    public bool KeyEnd;
 
    // Constructor 1
    public PersistentTrie(bool keyEnd) { KeyEnd = keyEnd; }
 
    // Constructor 2
    public PersistentTrie(PersistentTrie[] children,
                          bool keyEnd)
    {
        Children = children;
        KeyEnd = keyEnd;
    }
 
    // Detects existence of key in trie
    public bool FindKey(string key, int len,
                        PersistentTrie dummy)
    {
        // If reached end of key
        if (key.Length == len)
            return KeyEnd;
 
        // If we cannot find key[len] child in trie
        // we say key doesn't exist in the trie
        if (Children[key[len] - 'a'] == dummy)
            return false;
 
        // Recursively search the rest of
        // key length in children[key] trie
        return Children[key[len] - 'a'].FindKey(
            key, len + 1, dummy);
    }
 
    // Recursively search the rest of
    // key length in children[key] trie
    public PersistentTrie Insert(string key, int len)
    {
 
        // If reached the end of key string
        if (len == key.Length) {
            // Create new trie node with current trie node
            // marked as keyEnd
            return new PersistentTrie(
                (PersistentTrie[])Children.Clone(), true);
        }
 
        // Fetch current child nodes
        PersistentTrie[] new_version_PersistentTrie
            = (PersistentTrie[])Children.Clone();
 
        // Insert at key[len] child and
        // update the new child node
        PersistentTrie tmpNode
            = new_version_PersistentTrie[key[len] - 'a'];
        new_version_PersistentTrie[key[len] - 'a']
            = tmpNode.Insert(key, len + 1);
 
        // Return a new node with modified key[len] child
        // node
        return new PersistentTrie(
            new_version_PersistentTrie, false);
    }
}
 
public class GFG {
    static readonly int sz = 26;
 
    // Dummy PersistentTrie node
    static PersistentTrie dummy;
 
    // Initialize dummy for easy implementation
    static void Init()
    {
        dummy = new PersistentTrie(false);
 
        // All children of dummy as dummy
        PersistentTrie[] children = new PersistentTrie[sz];
        for (int i = 0; i < sz; i++)
            children[i] = dummy;
 
        dummy.Children = children;
    }
 
    // dfs traversal over the current trie
    // prints all the keys present in the current trie
    static void PrintAllKeysInTrie(PersistentTrie root,
                                   string s)
    {
        int flag = 0;
        for (int i = 0; i < sz; i++) {
            if (root.Children[i] != dummy) {
                flag = 1;
                PrintAllKeysInTrie(root.Children[i],
                                   s + (char)('a' + i));
            }
            if (root.Children[i].KeyEnd)
                Console.WriteLine(s + (char)('a' + i));
        }
    }
 
    // Driver code
    public static void Main(string[] args)
    {
 
        // Initialize the PersistentTrie
        Init();
 
        // Input keys
        List<string> keys
            = new List<string>{ "goku", "gohan", "goten",
                                "gogeta" };
 
        // Cache to store trie entry roots after each
        // insertion
        PersistentTrie[] root
            = new PersistentTrie[keys.Count + 1];
 
        // Marking first root as dummy
        root[0] = dummy;
 
        // Inserting all keys
        for (int i = 1; i <= keys.Count; i++) {
            // Caching new root for ith version of trie
            root[i] = root[i - 1].Insert(keys[i - 1], 0);
        }
 
        int idx = 3;
        Console.WriteLine(
            "All keys in trie after version - " + idx);
        string key = "";
 
        PrintAllKeysInTrie(root[3], key);
 
        string queryString = "goku";
 
        int l = 2, r = 3;
        Console.WriteLine("range : [" + l + ", " + r + "]");
 
        if (root[r].FindKey(queryString, 0, dummy)
            && !root[l - 1].FindKey(queryString, 0, dummy))
            Console.WriteLine(queryString
                              + " - exists in above range");
        else
            Console.WriteLine(
                queryString
                + " - does not exist in above range");
    }
}
 
// this article is contributed by bhardwajji

Python3

# Persistent Trie node structure
class PersistentTrie:
 
    def __init__(self, keyEnd=False, children=None):
        self.children = children if children is not None else [None]*26
        self.keyEnd = keyEnd
    # Returns the presence of key in current trie
 
    def findKey(self, key, length, dummy):
        # If reached end of key
        if len(key) == length:
           # Return if this is a keyEnd in trie
            return self.keyEnd
            # If we cannot find key[len] child in trie
    # we say key doesn't exist in the trie
        if self.children[ord(key[length]) - ord('a')] == dummy:
            return False
        # Recursively search the rest of
    # key length in children[key] trie
        return self.children[ord(key[length]) - ord('a')].findKey(key, length+1, dummy)
    # Inserts key into current trie
# returns newly created trie node after insertion
 
    def insert(self, key, length):
     # If reached the end of key string
        if length == len(key):
                # Create new trie node with current trie node
            # marked as keyEnd
            return PersistentTrie(children=self.children.copy(), keyEnd=True)
          # Fetch current child nodes
        new_children = self.children.copy()
        # Insert at key[len] child and
    # update the new child node
        new_children[ord(key[length]) - ord('a')
                     ] = self.children[ord(key[length]) - ord('a')].insert(key, length+1)
    # Return a new node with modified key[len] child node
        return PersistentTrie(children=new_children, keyEnd=False)
 
 
class GFG:
    # Distinct numbers of chars in key
    sz = 26
    dummy = None
 
    @staticmethod
    def init():
        GFG.dummy = PersistentTrie()
        GFG.dummy.children = [GFG.dummy]*GFG.sz
# dfs traversal over the current trie
# prints all the keys present in the current trie
 
    @staticmethod
    def printAllKeysInTrie(root, s):
        flag = 0
        for i in range(GFG.sz):
            if root.children[i] != GFG.dummy:
                flag = 1
                GFG.printAllKeysInTrie(root.children[i], s + chr(ord('a') + i))
            if root.children[i].keyEnd:
                print(s + chr(ord('a') + i))
# Driver code
 
    @staticmethod
    def main():
       # Initialize the PersistentTrie
        GFG.init()
        keys = ["goku", "gohan", "goten", "gogeta"]
        # Cache to store trie entry roots after each insertion
        root = [None]*(len(keys)+1)
        # Marking first root as dummy
        root[0] = GFG.dummy
  # Inserting all keys
        for i in range(1, len(keys)+1):
            root[i] = root[i-1].insert(keys[i-1], 0)
 
        idx = 3
        print("All keys in trie after version -", idx)
        key = ""
        GFG.printAllKeysInTrie(root[3], key)
 
        queryString = "goku"
        l, r = 2, 3
        print("range : [", l, ",", r, "]")
 
        if root[r].findKey(queryString, 0, GFG.dummy) and not root[l-1].findKey(queryString, 0, GFG.dummy):
            print(queryString, "- exists in above range")
        else:
            print(queryString, "- does not exist in above range")
 
        queryString = "goten"
        l, r = 2, 4
        print("range : [", l, ",", r, "]")
 
        if root[r].findKey(queryString, 0, GFG.dummy) and not root[l-1].findKey(queryString, 0, GFG.dummy):
            print(queryString, "- exists in above range")
        else:
            print(queryString, "- does not exist in above range")
 
 
if __name__ == '__main__':
    GFG.main()

Javascript

class PersistentTrie {
    constructor(keyEnd = false, children = Array(26).fill(null)) {
        this.children = children;
        this.keyEnd = keyEnd;
    }
     
    findKey(key, length, dummy) {
        if (key.length === length) {
            return this.keyEnd;
        }
         
        const child = this.children[key.charCodeAt(length) - 97];
        if (child === dummy) {
            return false;
        }
         
        return child.findKey(key, length + 1, dummy);
    }
     
    insert(key, length) {
        if (length === key.length) {
            return new PersistentTrie(true, [...this.children]);
        }
         
        const newChildren = [...this.children];
        const child = this.children[key.charCodeAt(length) - 97].insert(key, length + 1);
        newChildren[key.charCodeAt(length) - 97] = child;
         
        return new PersistentTrie(false, newChildren);
    }
}
 
class GFG {
    static sz = 26;
    static dummy = null;
     
    static init() {
        GFG.dummy = new PersistentTrie();
        GFG.dummy.children = Array(GFG.sz).fill(GFG.dummy);
    }
     
    static printAllKeysInTrie(root, s) {
        let flag = 0;
        for (let i = 0; i < GFG.sz; i++) {
            if (root.children[i] !== GFG.dummy) {
                flag = 1;
                GFG.printAllKeysInTrie(root.children[i], s + String.fromCharCode(97 + i));
            }
             
            if (root.children[i].keyEnd) {
                console.log(s + String.fromCharCode(97 + i));
            }
        }
    }
     
    static main() {
        GFG.init();
        const keys = ["goku", "gohan", "goten", "gogeta"];
        const root = Array(keys.length + 1).fill(null);
        root[0] = GFG.dummy;
         
        for (let i = 1; i <= keys.length; i++) {
            root[i] = root[i - 1].insert(keys[i - 1], 0);
        }
         
        const idx = 3;
        console.log(`All keys in trie after version - ${idx}`);
        const key = "";
        GFG.printAllKeysInTrie(root[3], key);
         
        let queryString = "goku";
        let l = 2, r = 3;
        console.log(`range : [${l}, ${r}]`);
        if (root[r].findKey(queryString, 0, GFG.dummy) && !root[l - 1].findKey(queryString, 0, GFG.dummy)) {
            console.log(`${queryString} - exists in above range`);
        } else {
            console.log(`${queryString} - does not exist in above range`);
        }
         
        queryString = "goten";
        l = 2, r = 4;
        console.log(`range : [${l}, ${r}]`);
        if (root[r].findKey(queryString, 0, GFG.dummy) && !root[l - 1].findKey(queryString, 0, GFG.dummy)) {
            console.log(`${queryString} - exists in above range`);
        } else {
            console.log(`${queryString} - does not exist in above range`);
        }
    }
}
 
GFG.main();

Output:

All keys in trie after version - 3
gohan
goku
goten
range : [2, 3]
goku - does not exist in above range
range : [2, 4]
goten - exists in above range

Time Complexity: As discussed above we will be visiting all the X(length of key) number of nodes in the trie while inserting; So, we will be visiting the X number of states and at each state we will be doing O(sz) amount of work by liking the sz children of the previous version with the current version for the newly created trie nodes. Hence, Time Complexity of insertion becomes O(length_of_key * sz). But the searching the is still linear over the length of the key to be searched and hence, the time complexity of searching a key is still O(length_of_key) just like a standard trie.
Space Complexity: Obviously, persistency in data structures comes with a trade of space and we will be consuming more memory in maintaining the different versions of the trie. Now, let us visualize the worst case – for insertion, we are creating O(length_of_key) nodes and each newly created node will take a space of O(sz) to store its children. Hence, the space complexity for insertion of the above implementation is O(length_of_key * sz).