Count of distinct numbers in an Array in a range for Online Queries using Merge Sort Tree

Given an array arr[] of size N and Q queries of the form [L, R], the task is to find the number of distinct values in this array in the given range.

Examples:

Input: arr[] = {4, 1, 9, 1, 3, 3}, Q = {{1, 3}, {1, 5}}
Output:
3
4
Explanation:
For query {1, 3}, elements are {4, 1, 9}. Therefore, count of distinct elements = 3
For query {1, 5}, elements are {4, 1, 9, 1, 3}. Therefore, count of distinct elements = 4

Input: arr[] = {4, 2, 1, 1, 4}, Q = {{2, 4}, {3, 5}}
Output:
3
2

Naive Approach: A simple solution is that for every Query, iterate array from L to R and insert elements in a set. Finally, the Size of the set gives the number of distinct elements from L to R.



Time Complexity: O(Q * N)

Efficient Approach: The idea is to use Merge Sort Tree to solve this problem.

  1. We will store the next occurrence of the element in a temporary array.
  2. Then for every query from L to R, we will find the number of elements in the temporary array whose values are greater than R in range L to R.

Step 1: Take an array next_right, where next_right[i] holds the next right index of the number i in the array a. Initialize this array as N(length of the array).
Step 2: Make a Merge Sort Tree from next_right array and make queries. Queries to calculate the number of distinct elements from L to R is equivalent to find the number of elements from L to R which are greater than R.

Construction of Merge Sort Tree from given array

  • We start with a segment arr[0 . . . n-1].
  • Every time we divide the current segment into two halves if it has not yet become a segment of length 1. Then call the same procedure on both halves, and for each such segment, we store the sorted array in each segment as in merge sort.
  • Also, the tree will be a Full Binary Tree because we always divide segments into two halves at every level.
  • Since the constructed tree is always a full binary tree with n leaves, there will be N-1 internal nodes. So the total number of nodes will be 2*N – 1.

Here is an example. Say 1 5 2 6 9 4 7 1 be an array.

|1 1 2 4 5 6 7 9|
|1 2 5 6|1 4 7 9|
|1 5|2 6|4 9|1 7|
|1|5|2|6|9|4|7|1|

Construction of next_right array

  • We store the next right occurence of every element.
  • If the element has the last occurence then we store ‘N'(Length of the array)
    Example:

    arr = [2, 3, 2, 3, 5, 6];
    next_right = [2, 3, 6, 6, 6, 6]
    

Below is the implementation of the above approach:

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ implementation to find
// count of distinct elements 
// in a range L to R for Q queries
  
#include <bits/stdc++.h>
using namespace std;
  
// Function to merge the right
// and the left tree
void merge(vector<int> tree[], 
                 int treeNode)
{
    int len1 = 
      tree[2 * treeNode].size();
    int len2 = 
      tree[2 * treeNode + 1].size();
    int index1 = 0, index2 = 0;
  
    // Fill this array in such a 
    // way such that values
    // remain sorted similar to mergesort
    while (index1 < len1 && index2 < len2) {
  
        // If the element on the left part
        // is greater than the right part
        if (tree[2 * treeNode][index1] > 
              tree[2 * treeNode + 1][index2]) {
  
            tree[treeNode].push_back(
                tree[2 * treeNode + 1][index2]
                );
            index2++;
        }
        else {
            tree[treeNode].push_back(
                tree[2 * treeNode][index1]
                );
            index1++;
        }
    }
  
    // Insert the leftover elements
    // from the left part
    while (index1 < len1) {
        tree[treeNode].push_back(
            tree[2 * treeNode][index1]
            );
        index1++;
    }
  
    // Insert the leftover elements
    // from the right part
    while (index2 < len2) {
        tree[treeNode].push_back(
            tree[2 * treeNode + 1][index2]
            );
        index2++;
    }
    return;
}
  
// Recursive function to build 
// segment tree by merging the 
// sorted segments in sorted way
void build(vector<int> tree[], 
    int* arr, int start, int end, 
                  int treeNode)
{
    // Base case
    if (start == end) {
        tree[treeNode].push_back(
            arr[start]);
        return;
    }
    int mid = (start + end) / 2;
  
    // Building the left tree
    build(tree, arr, start, 
          mid, 2 * treeNode);
  
    // Building the right tree
    build(tree, arr, mid + 1, end, 
                 2 * treeNode + 1);
  
    // Merges the right tree
    // and left tree
    merge(tree, treeNode);
    return;
}
  
// Function similar to query() method
// as in segment tree
int query(vector<int> tree[], 
     int treeNode, int start, int end, 
            int left, int right)
{
  
    // Current segment is out of the range
    if (start > right || end < left) {
        return 0;
    }
    // Current segment completely 
    // lies inside the range
    if (start >= left && end <= right) {
  
        // as the elements are in sorted order
        // so number of elements greater than R
        // can be find using binary 
        // search or upper_bound
        return tree[treeNode].end() - 
          upper_bound(tree[treeNode].begin(), 
            tree[treeNode].end(), right);
    }
  
    int mid = (start + end) / 2;
  
    // Query on the left tree
    int op1 = query(tree, 2 * treeNode, 
              start, mid, left, right);
    // Query on the Right tree
    int op2 = query(tree, 2 * treeNode + 1, 
            mid + 1, end, left, right);
    return op1 + op2;
}
  
// Driver Code
int main()
{
  
    int n = 5;
    int arr[] = { 1, 2, 1, 4, 2 };
  
    int next_right[n];
    // Initialising the tree
    vector<int> tree[4 * n];
  
    unordered_map<int, int> ump;
  
    // Construction of next_right 
    // array to store the
    // next index of occurence 
    // of elements
    for (int i = n - 1; i >= 0; i--) {
        if (ump[arr[i]] == 0) {
            next_right[i] = n;
            ump[arr[i]] = i;
        }
        else {
            next_right[i] = ump[arr[i]];
            ump[arr[i]] = i;
        }
    }
    // building the mergesort tree
    // by using next_right array
    build(tree, next_right, 0, n - 1, 1);
  
    int ans;
    // Queries one based indexing
    // Time complexity of each 
    // query is log(N)
  
    // first query
    int left1 = 0;
    int right1 = 2;
    ans = query(tree, 1, 0, n - 1, 
                  left1, right1);
    cout << ans << endl;
  
    // Second Query
    int left2 = 1;
    int right2 = 4;
    ans = query(tree, 1, 0, n - 1, 
                  left2, right2);
    cout << ans << endl;
}

chevron_right


Output:

2
3

Time Complexity: O(Q*log N)

competitive-programming-img




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.