Persistent Trie | Set 1 (Introduction)
Trie is one handy data structure that often comes into play when performing multiple string lookups. In this post, we will introduce the concept of Persistency in this data structure. Persistency simply means to retain the changes. But obviously, retaining the changes cause extra memory consumption and hence affect the Time Complexity.
Our aim is to apply persistency in Trie and also to ensure that it does not take more than the standard trie searching i.e. O(length_of_key). We will also analyze the extra space complexity that persistency causes over the standard Space Complexity of a Trie.
Let’s think in terms of versions i.e. for each change/insertion in our Trie we create a new version of it.
We will consider our initial version to be Version-0. Now, as we do any insertion in the trie we will create a new version for it and in similar fashion track the record for all versions.
But creating the whole trie every time for every version keeps doubling up memory and affects the Space Complexity very badly. So, this idea will easily run out of memory for a large number of versions.
Let’s exploit the fact that for each new insertion in the trie, exactly X (length_of_key) nodes will be visited/modified. So, our new version will only contain these X new nodes and rest trie nodes will be the same as the previous version. Therefore, it is quite clear that for each new version we only need to create these X new nodes whereas the rest of the trie nodes can be shared from the previous version.
Consider the below figure for better visualization:
Now, the Question arises: How to keep track of all the versions?
We only need to keep track the first root node for all the versions and this will serve the purpose to track all the newly created nodes in the different versions as the root node gives us the entry point for that particular version. For this purpose, we can maintain an array of pointers to the root node of the trie for all versions.
Let’s consider the below scenario and see how we can use Persistent Trie to solve it !
Given an array of strings and we need to determine if a string exists in some
range [l, r] in the array. To have an analogy, consider the array to be a
list of words in a dictionary at ith page(i is the index of the array) and
we need to determine whether a given word X exists in the page range [l, r]?
Below is the implementation for the above problem:-
All keys in trie after version - 3 gohan goku goten range : [2, 3] goku - does not exist in above range range : [2, 4] goten - exists in above range
Time Complexity: As discussed above we will be visiting all the X(length of key) number of nodes in the trie while inserting; So, we will be visiting the X number of states and at each state we will be doing O(sz) amount of work by liking the sz children of the previous version with the current version for the newly created trie nodes. Hence, Time Complexity of insertion becomes O(length_of_key * sz). But the searching the is still linear over the length of the key to be searched and hence, the time complexity of searching a key is still O(length_of_key) just like a standard trie.
Space Complexity: Obviously, persistency in data structures comes with a trade of space and we will be consuming more memory in maintaining the different versions of the trie. Now, let us visualize the worst case – for insertion, we are creating O(length_of_key) nodes and each newly created node will take a space of O(sz) to store its children. Hence, the space complexity for insertion of the above implementation is O(length_of_key * sz).