Wavelet Trees | Introduction

A wavelet tree is a data structure that recursively partitions a stream into two parts until we’re left with homogeneous data. The name derives from an analogy with the wavelet transform for signals, which recursively decomposes a signal into low-frequency and high-frequency components. Wavelet trees can be used to answer range queries efficiently.

Consider the problem to find number of elements in a range [L, R] of a given array A which are less than x. One way to solve this problem efficiently is using Persistent Segment Tree data structure. But we can also solve this easily using Wavelet Trees. Let us see how!

Constructing Wavelet Trees



Every node in a wavelet tree is represented by an array which is the subsequence of original array and a range [L, R]. Here [L, R] is the range in which elements of array falls. That is, ‘R’ denotes maximum element in the array and ‘L’ denotes the smallest element. So, the root node will contain the original array in which elements are in range [L, R]. Now we will calculate the middle of the range [L, R] and stable partition the array in two halfs for the left and right childs. Therefore, the left child will contains elements that lies in range [L, mid] and right child will contain elements that lies in the range [mid+1, R].

Suppose we are given an array of integers. Now we compute the mid (Max + Min / 2) and form two children.
Left Children: Integers less than/equal to Mid
Right Children: Integers greater than Mid
We recursively perform this operation until all node of similar elements are formed.

Given array : 0 0 9 1 2 1 7 6 4 8 9 4 3 7 5 9 2 7 0 5 1 0
Forming a wavelet tree

To construct a Wavelet Tree, let us see what will we need to store at each node. So at each node of the tree, we will store two arrays say S[] and freq[]. The array S[] will be a subsequence of the original array A[] and the array freq[] will store the count of the elements that will go to left and right childs of the node. That is, freq[i] will denote the count of elements from the first i elements of S[] that will go to left child. Therefore, count of elements that will go to right child can be easily calculated as (i – freq[i]).

Below example shows how to maintain freq[] array:

Array : 1 5 2 6 4 4
Mid = (1 + 6) / 2 = 3
Left Child : 1 2
Right Child : 5 6 4 4

To maintain frequency array, we will check if the element is less than Mid or not. If yes, then we will add 1 to last element of frequency array, else 0 and push back again.

For, above array :
Freq array :{1, 1, 2, 2, 2, 2}

It implies 1 element will go to left child of this node from index 1 and 2, and 2 elements will go to left child from indices 3 to 6. This can be easily depicted from the above given array.
To compute the number of elements moving to right subtree, we subtract freq[i] from i.

From index 1, 0 elements go to right subtree.
From index 2, 1 element go to right subtree.
From index 3, 1 element go to right subtree.
From index 4, 2 elements go to right subtree.
From index 5, 3 elements go to right subtree.
From index 6, 4 elements go to right subtree.

We can use the stable_partition function and lambda expression in C++ STL to easily stable partition the array around a pivot without distorting the order of elements in original sequence. It is highly recommended to go through the stable_partition and lambda expression articles before moving onto implementation.
Below is the implementation of construction of Wavelet Trees:

filter_none

edit
close

play_arrow

link
brightness_4
code

// CPP code to implement wavelet trees
#include <bits/stdc++.h>
using namespace std;
#define N 100000
  
// Given array
int arr[N];
  
// wavelet tree class
class wavelet_tree {
public:
  
    // Range to elements
    int low, high;
  
    // Left and Right children
    wavelet_tree* l, *r;
  
    vector<int> freq;
  
    // Default constructor
    // Array is in range [x, y]
    // Indices are in range [from, to]
    wavelet_tree(int* from, int* to, int x, int y)
    {
        // Initialising low and high
        low = x, high = y;
  
        // Array is of 0 length
        if (from >= to)
            return;
  
        // Array is homogenous
        // Example : 1 1 1 1 1
        if (high == low) {
  
            // Assigning storage to freq array
            freq.reserve(to - from + 1);
  
            // Initialising the Freq array
            freq.push_back(0);
  
            // Assigning values
            for (auto it = from; it != to; it++) 
  
                // freq will be increasing as there'll 
                // be no further sub-tree
                freq.push_back(freq.back() + 1);
              
            return;
        }
  
        // Computing mid
        int mid = (low + high) / 2;
  
        // Lambda function to check if a number is 
        // less than or equal to mid
        auto lessThanMid = [mid](int x) {
            return x <= mid;
        };
  
        // Assigning storage to freq array
        freq.reserve(to - from + 1);
  
        // Initialising the freq array
        freq.push_back(0);
  
        // Assigning value to freq array
        for (auto it = from; it != to; it++) 
  
            // If lessThanMid returns 1(true), we add
            // 1 to previous entry. Otherwise, we add
            // 0 (element goes to right sub-tree)
            freq.push_back(freq.back() + lessThanMid(*it));
  
        // std::stable_partition partitions the array w.r.t Mid
        auto pivot = stable_partition(from, to, lessThanMid);
  
        // Left sub-tree's object
        l = new wavelet_tree(from, pivot, low, mid);
  
        // Right sub-tree's object
        r = new wavelet_tree(pivot, to, mid + 1, high);
    }
};
  
// Driver code
int main()
{
    int size = 5, high = INT_MIN;    
    int arr[] = {1 , 2, 3, 4, 5}; 
    for (int i = 0; i < size; i++) 
        high = max(high, arr[i]);    
  
    // Object of class wavelet tree
    wavelet_tree obj(arr, arr + size, 1, high);
  
    return 0;
}

chevron_right


Height of the tree: O(log(max(A)) , where max(A) is the maximum element in the array A[].


Querying in Wavelet Trees

We have already constructed our wavelet tree for the given array. Now we will move on to our problem to calculate number of elements less than or equal to x in range [ L,R ] in the given array.

So, for each node we have a subsequence of original array, lowest and highest values present in the array and count of elements in left and right child.

Now,

If high <= x, 
   we return R - L + 1. 
i.e. all the elements in the current range is less than x.

Otherwise, We will use variable LtCount = freq[ L-1 ] (i.e. elements going to left sub-tree from L-1) , RtCount = freq[ R ] (i.e. elements going to right sub-tree from R)
Now, we recursively call and add the return values of :

left sub-tree with range[ LtCount + 1, RtCount ] and, 
right sub-tree with range[ L - Ltcount,R - RtCount ]

Below is the implementation in C++:

filter_none

edit
close

play_arrow

link
brightness_4
code

// CPP program for querying in
// wavelet tree Data Structure
#include <bits/stdc++.h>
using namespace std;
#define N 100000
  
// Given Array
int arr[N];
  
// wavelet tree class
class wavelet_tree {
public:
    // Range to elements
    int low, high;
  
    // Left and Right child
    wavelet_tree* l, *r;
  
    vector<int> freq;
  
    // Default constructor
    // Array is in range [x, y]
    // Indices are in range [from, to]
    wavelet_tree(int* from, int* to, int x, int y)
    {
        // Initialising low and high
        low = x, high = y;
  
        // Array is of 0 length
        if (from >= to)
            return;
  
        // Array is homogenous
        // Example : 1 1 1 1 1
        if (high == low) {
            // Assigning storage to freq array
            freq.reserve(to - from + 1);
  
            // Initialising the Freq array
            freq.push_back(0);
  
            // Assigning values
            for (auto it = from; it != to; it++) 
              
                // freq will be increasing as there'll
                // be no further sub-tree
                freq.push_back(freq.back() + 1);
              
            return;
        }
  
        // Computing mid
        int mid = (low + high) / 2;
  
        // Lambda function to check if a number
        // is less than or equal to mid
        auto lessThanMid = [mid](int x) {
            return x <= mid;
        };
  
        // Assigning storage to freq array
        freq.reserve(to - from + 1);
  
        // Initialising the freq array
        freq.push_back(0);
  
        // Assigning value to freq array
        for (auto it = from; it != to; it++) 
  
            // If lessThanMid returns 1(true), we add
            // 1 to previous entry. Otherwise, we add 0
            // (element goes to right sub-tree)
            freq.push_back(freq.back() + lessThanMid(*it));        
  
        // std::stable_partition partitions the array w.r.t Mid
        auto pivot = stable_partition(from, to, lessThanMid);
  
        // Left sub-tree's object
        l = new wavelet_tree(from, pivot, low, mid);
  
        // Right sub-tree's object
        r = new wavelet_tree(pivot, to, mid + 1, high);
    }
  
    // Count of numbers in range[L..R] less than 
    // or equal to k
    int kOrLess(int l, int r, int k)
    {
        // No elements int range is less than k
        if (l > r or k < low)
            return 0;
  
        // All elements in the range are less than k
        if (high <= k)
            return r - l + 1;
  
        // Computing LtCount and RtCount
        int LtCount = freq[l - 1];
        int RtCount = freq[r];
  
        // Answer is (no. of element <= k) in
        // left + (those <= k) in right
        return (this->l->kOrLess(LtCount + 1, RtCount, k) + 
             this->r->kOrLess(l - LtCount, r - RtCount, k));
    }
  
};
  
// Driver code
int main()
{
    int size = 5, high = INT_MIN;        
    int arr[] = {1, 2, 3, 4, 5};    
      
    // Array : 1 2 3 4 5
    for (int i = 0; i < size; i++)     
        high = max(high, arr[i]);
  
    // Object of class wavelet tree
    wavelet_tree obj(arr, arr + size, 1, high);
  
    // count of elements less than 2 in range [1,3]
    cout << obj.kOrLess(0, 3, 2) << '\n';
  
    return 0;
}

chevron_right


Output :

2

Time Complexity: O(log(max(A)) , where max(A) is the maximum element in the array A[].
In this post we have discussed about a single problem on range queries without update. In further we will be discussing on range updates also.

References :

This article is contributed by Rohit Thapliyal. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.



My Personal Notes arrow_drop_up


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.