Persistent Trie | Set 1 (Introduction)

Prerequisite:

  1. Trie
  2. Persistency in Data Structure

Trie is one handy data structure that often comes into play when performing multiple string lookups. In this post, we will introduce the concept of Persistency in this data structure. Persistency simply means to retain the changes. But obviously, retaining the changes cause extra memory consumption and hence affect the Time Complexity.

Our aim is to apply persistency in Trie and also to ensure that it does not take more than the standard trie searching i.e. O(length_of_key). We will also analyze the extra space complexity that persistency causes over the standard Space Complexity of a Trie.



Let’s think in terms of versions i.e. for each change/insertion in our Trie we create a new version of it.
We will consider our initial version to be Version-0. Now, as we do any insertion in the trie we will create a new version for it and in similar fashion track the record for all versions.

But creating the whole trie every time for every version keeps doubling up memory and affects the Space Complexity very badly. So, this idea will easily run out of memory for a large number of versions.

Let’s exploit the fact that for each new insertion in the trie, exactly X (length_of_key) nodes will be visited/modified. So, our new version will only contain these X new nodes and rest trie nodes will be the same as the previous version. Therefore, it is quite clear that for each new version we only need to create these X new nodes whereas the rest of the trie nodes can be shared from the previous version.

Consider the below figure for better visualization:

Now, the Question arises: How to keep track of all the versions?
We only need to keep track the first root node for all the versions and this will serve the purpose to track all the newly created nodes in the different versions as the root node gives us the entry point for that particular version. For this purpose, we can maintain an array of pointers to the root node of the trie for all versions.

Let’s consider the below scenario and see how we can use Persistent Trie to solve it !
Given an array of strings and we need to determine if a string exists in some
range [l, r] in the array. To have an analogy, consider the array to be a
list of words in a dictionary at ith page(i is the index of the array) and
we need to determine whether a given word X exists in the page range [l, r]?

Below is the C++ implementation for the above problem:-

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ implementation of the approach
#include <bits/stdc++.h>
using namespace std;
  
// Distinct numbers of chars in key
const int sz = 26;
  
// Persistent Trie node structure
struct PersistentTrie {
  
    // Stores all children nodes, where ith children denotes
    // ith alphabetical character
    vector<PersistentTrie*> children;
  
    // Marks the ending of the key
    bool keyEnd = false;
  
    // Constructor 1
    PersistentTrie(bool keyEnd = false)
    {
        this->keyEnd = keyEnd;
    }
  
    // Constructor 2
    PersistentTrie(vector<PersistentTrie*>& children, bool keyEnd = false)
    {
        this->children = children;
        this->keyEnd = keyEnd;
    }
  
    // detects existence of key in trie
    bool findKey(string& key, int len);
  
    // Inserts key into trie
    // returns new node after insertion
    PersistentTrie* insert(string& key, int len);
};
  
// Dummy PersistentTrie node
PersistentTrie* dummy;
  
// Initialize dummy for easy implementation
void init()
{
    dummy = new PersistentTrie(true);
  
    // All children of dummy as dummy
    vector<PersistentTrie*> children(sz, dummy);
    dummy->children = children;
}
  
// Inserts key into current trie
// returns newly created trie node after insertion
PersistentTrie* PersistentTrie::insert(string& key, int len)
{
  
    // If reached the end of key string
    if (len == key.length()) {
  
        // Create new trie node with current trie node
        // marked as keyEnd
        return new PersistentTrie((*this).children, true);
    }
  
    // Fetch current child nodes
    vector<PersistentTrie*> new_version_PersistentTrie = (*this).children;
  
    // Insert at key[len] child and
    // update the new child node
    PersistentTrie* tmpNode = new_version_PersistentTrie[key[len] - 'a'];
    new_version_PersistentTrie[key[len] - 'a'] = tmpNode->insert(key, len + 1);
  
    // Return a new node with modified key[len] child node
    return new PersistentTrie(new_version_PersistentTrie);
}
  
// Returns the presence of key in current trie
bool PersistentTrie::findKey(string& key, int len)
{
    // If reached end of key
    if (key.length() == len)
  
        // Return if this is a keyEnd in trie
        return this->keyEnd;
  
    // If we cannot find key[len] child in trie
    // we say key doesn't exist in the trie
    if (this->children[key[len] - 'a'] == dummy)
        return false;
  
    // Recursively search the rest of
    // key length in children[key] trie
    return this->children[key[len] - 'a']->findKey(key, len + 1);
}
  
// dfs traversal over the current trie
// prints all the keys present in the current trie
void printAllKeysInTrie(PersistentTrie* root, string& s)
{
    int flag = 0;
    for (int i = 0; i < sz; i++) {
        if (root->children[i] != dummy) {
            flag = 1;
            s.push_back('a' + i);
            printAllKeysInTrie(root->children[i], s);
            s.pop_back();
        }
    }
    if (flag == 0 and s.length() > 0)
        cout << s << endl;
}
  
// Driver code
int main(int argc, char const* argv[])
{
  
    // Initialize the PersistentTrie
    init();
  
    // Input keys
    vector<string> keys({ "goku", "gohan", "goten", "gogeta" });
  
    // Cache to store trie entry roots after each insertion
    PersistentTrie* root[keys.size()];
  
    // Marking first root as dummy
    root[0] = dummy;
  
    // Inserting all keys
    for (int i = 1; i <= keys.size(); i++) {
  
        // Caching new root for ith version of trie
        root[i] = root[i - 1]->insert(keys[i - 1], 0);
    }
  
    int idx = 3;
    cout << "All keys in trie after version - " << idx << endl;
    string key = "";
    printAllKeysInTrie(root[idx], key);
  
    string queryString = "goku";
    int l = 2, r = 3;
    cout << "range : "
         << "[" << l << ", " << r << "]" << endl;
    if (root[r]->findKey(queryString, 0) and !root[l - 1]->findKey(queryString, 0))
        cout << queryString << " - exists in above range" << endl;
    else
        cout << queryString << " - does not exist in above range" << endl;
  
    queryString = "goten";
    l = 2, r = 4;
    cout << "range : "
         << "[" << l << ", " << r << "]" << endl;
    if (root[r]->findKey(queryString, 0) and !root[l - 1]->findKey(queryString, 0))
        cout << queryString << " - exists in above range" << endl;
    else
        cout << queryString << " - does not exist in above range" << endl;
  
    return 0;
}

chevron_right


Output:

All keys in trie after version - 3
gohan
goku
goten
range : [2, 3]
goku - does not exist in above range
range : [2, 4]
goten - exists in above range

Time Complexity: As discussed above we will be visiting all the X(length of key) number of nodes in the trie while inserting; So, we will be visiting the X number of states and at each state we will be doing O(sz) amount of work by liking the sz children of the previous version with the current version for the newly created trie nodes. Hence, Time Complexity of insertion becomes O(length_of_key * sz). But the searching the is still linear over the length of the key to be searched and hence, the time complexity of searching a key is still O(length_of_key) just like a standard trie.

Space Complexity: Obviously, persistency in data structures comes with a trade of space and we will be consuming more memory in maintaining the different versions of the trie. Now, let us visualize the worst case – for insertion, we are creating O(length_of_key) nodes and each newly created node will take a space of O(sz) to store its children. Hence, the space complexity for insertion of the above implementation is O(length_of_key * sz).



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.