Count of distinct numbers in an Array in a range for Online Queries using Merge Sort Tree

Last Updated : 20 Apr, 2023

Given an array arr[] of size N and Q queries of the form [L, R], the task is to find the number of distinct values in this array in the given range.
Examples:

Input: arr[] = {4, 1, 9, 1, 3, 3}, Q = {{1, 3}, {1, 5}}
Output: 3 4
Explanation: For query {1, 3}, elements are {4, 1, 9}.
Therefore, count of distinct elements = 3
For query {1, 5}, elements are {4, 1, 9, 1, 3}.
Therefore, count of distinct elements = 4
Input: arr[] = {4, 2, 1, 1, 4}, Q = {{2, 4}, {3, 5}}
Output: 3 2

Naive Approach: A simple solution is that for every Query, iterate array from L to R and insert elements in a set. Finally, the Size of the set gives the number of distinct elements from L to R.
Time Complexity: O(Q * N)

Efficient Approach: The idea is to use Merge Sort Tree to solve this problem.

We will store the next occurrence of the element in a temporary array.
Then for every query from L to R, we will find the number of elements in the temporary array whose values are greater than R in range L to R.

Step 1: Take an array next_right, where next_right[i] holds the next right index of the number i in the array a. Initialize this array as N(length of the array).
Step 2: Make a Merge Sort Tree from next_right array and make queries. Queries to calculate the number of distinct elements from L to R is equivalent to find the number of elements from L to R which are greater than R.

Construction of Merge Sort Tree from given array

We start with a segment arr[0 . . . n-1].
Every time we divide the current segment into two halves if it has not yet become a segment of length 1. Then call the same procedure on both halves, and for each such segment, we store the sorted array in each segment as in merge sort.
Also, the tree will be a Full Binary Tree because we always divide segments into two halves at every level.
Since the constructed tree is always a full binary tree with n leaves, there will be N-1 internal nodes. So the total number of nodes will be 2*N – 1.

Here is an example. Say 1 5 2 6 9 4 7 1 be an array.

|1 1 2 4 5 6 7 9|
|1 2 5 6|1 4 7 9|
|1 5|2 6|4 9|1 7|
|1|5|2|6|9|4|7|1|

Construction of next_right array

We store the next right occurrence of every element.
If the element has the last occurrence then we store ‘N'(Length of the array)

Example:

arr = [2, 3, 2, 3, 5, 6];
next_right = [2, 3, 6, 6, 6, 6]

Below is the implementation of the above approach:

C++

// C++ implementation to find
// count of distinct elements
// in a range L to R for Q queries
 
#include <bits/stdc++.h>
using namespace std;
 
// Function to merge the right
// and the left tree
void merge(vector<int> tree[], int treeNode)
{
    int len1 = tree[2 * treeNode].size();
    int len2 = tree[2 * treeNode + 1].size();
    int index1 = 0, index2 = 0;
 
    // Fill this array in such a
    // way such that values
    // remain sorted similar to mergesort
    while (index1 < len1 && index2 < len2) {
 
        // If the element on the left part
        // is greater than the right part
        if (tree[2 * treeNode][index1]
            > tree[2 * treeNode + 1][index2]) {
 
            tree[treeNode].push_back(
                tree[2 * treeNode + 1][index2]);
            index2++;
        }
        else {
            tree[treeNode].push_back(
                tree[2 * treeNode][index1]);
            index1++;
        }
    }
 
    // Insert the leftover elements
    // from the left part
    while (index1 < len1) {
        tree[treeNode].push_back(
            tree[2 * treeNode][index1]);
        index1++;
    }
 
    // Insert the leftover elements
    // from the right part
    while (index2 < len2) {
        tree[treeNode].push_back(
            tree[2 * treeNode + 1][index2]);
        index2++;
    }
    return;
}
 
// Recursive function to build
// segment tree by merging the
// sorted segments in sorted way
void build(vector<int> tree[], int* arr, int start, int end,
           int treeNode)
{
    // Base case
    if (start == end) {
        tree[treeNode].push_back(arr[start]);
        return;
    }
    int mid = (start + end) / 2;
 
    // Building the left tree
    build(tree, arr, start, mid, 2 * treeNode);
 
    // Building the right tree
    build(tree, arr, mid + 1, end, 2 * treeNode + 1);
 
    // Merges the right tree
    // and left tree
    merge(tree, treeNode);
    return;
}
 
// Function similar to query() method
// as in segment tree
int query(vector<int> tree[], int treeNode, int start,
          int end, int left, int right)
{
 
    // Current segment is out of the range
    if (start > right || end < left) {
        return 0;
    }
    // Current segment completely
    // lies inside the range
    if (start >= left && end <= right) {
 
        // as the elements are in sorted order
        // so number of elements greater than R
        // can be find using binary
        // search or upper_bound
        return tree[treeNode].end()
               - upper_bound(tree[treeNode].begin(),
                             tree[treeNode].end(), right);
    }
 
    int mid = (start + end) / 2;
 
    // Query on the left tree
    int op1 = query(tree, 2 * treeNode, start, mid, left,
                    right);
    // Query on the Right tree
    int op2 = query(tree, 2 * treeNode + 1, mid + 1, end,
                    left, right);
    return op1 + op2;
}
 
// Driver Code
int main()
{
 
    int n = 5;
    int arr[] = { 1, 2, 1, 4, 2 };
 
    int next_right[n];
    // Initialising the tree
    vector<int> tree[4 * n];
 
    unordered_map<int, int> ump;
 
    // Construction of next_right
    // array to store the
    // next index of occurrence
    // of elements
    for (int i = n - 1; i >= 0; i--) {
        if (ump[arr[i]] == 0) {
            next_right[i] = n;
            ump[arr[i]] = i;
        }
        else {
            next_right[i] = ump[arr[i]];
            ump[arr[i]] = i;
        }
    }
    // building the mergesort tree
    // by using next_right array
    build(tree, next_right, 0, n - 1, 1);
 
    int ans;
    // Queries one based indexing
    // Time complexity of each
    // query is log(N)
 
    // first query
    int left1 = 0;
    int right1 = 2;
    ans = query(tree, 1, 0, n - 1, left1, right1);
    cout << ans << endl;
 
    // Second Query
    int left2 = 1;
    int right2 = 4;
    ans = query(tree, 1, 0, n - 1, left2, right2);
    cout << ans << endl;
}

Java

// Java implementation to find
// count of distinct elements
// in a range L to R for Q queries
import java.util.*;
 
public class Main {
     
    // Function to merge the right
    // and the left tree
    static void merge(List<Integer>[] tree, int treeNode){
        int len1 = tree[2 * treeNode].size();
        int len2 = tree[2 * treeNode + 1].size();
        int index1 = 0, index2 = 0;
         
        // Fill this array in such a
        // way such that values
        // remain sorted similar to mergesort
        while (index1 < len1 && index2 < len2) {
             
            // If the element on the left part
            // is greater than the right part
            if (tree[2 * treeNode].get(index1)
                > tree[2 * treeNode + 1].get(index2)) {
                tree[treeNode].add(
                    tree[2 * treeNode + 1].get(index2));
                index2++;
            }
            else {
                tree[treeNode].add(
                    tree[2 * treeNode].get(index1));
                index1++;
            }
        }
         
        // Insert the leftover elements
        // from the left part
        while (index1 < len1) {
            tree[treeNode].add(
                tree[2 * treeNode].get(index1));
            index1++;
        }
        // Insert the leftover elements
        // from the right part
        while (index2 < len2) {
            tree[treeNode].add(
                tree[2 * treeNode + 1].get(index2));
            index2++;
        }
    }
     
    // Recursive function to build
    // segment tree by merging the
    // sorted segments in sorted way
    static void build(List<Integer>[] tree, int[] arr, int start, int end,
                      int treeNode){
                           
        // Base case
        if (start == end) {
            tree[treeNode].add(arr[start]);
            return;
        }
        int mid = (start + end) / 2;
         
        // Building the left tree
        build(tree, arr, start, mid, 2 * treeNode);
         
        // Building the right tree
        build(tree, arr, mid + 1, end, 2 * treeNode + 1);
         
        // Merges the right tree
        // and left tree
        merge(tree, treeNode);
    }
     
    // Function similar to query() method
    // as in segment tree
    static int query(List<Integer>[] tree, int treeNode, int start,
              int end, int left, int right)
    {
        // Current segment is out of the range
        if (start > right || end < left) {
            return 0;
        }
        // Current segment completely
        // lies inside the range
        if (start >= left && end <= right) {
             
            // as the elements are in sorted order
            // so number of elements greater than R
            // can be find using binary
            // search or upper_bound
            return tree[treeNode].size()
               - Collections.binarySearch(tree[treeNode], right + 1);
        }
        int mid = (start + end) / 2;
         
        // Query on the left tree
        int op1 = query(tree, 2 * treeNode, start, mid, left,
                    right);
                     
        // Query on the Right tree
        int op2 = query(tree, 2 * treeNode + 1, mid + 1, end,
                    left, right);
        return op1 + op2;
    }
    // Driver Code
    public static void main(String[] args) {
        int n = 5;
        int[] arr = { 1, 2, 1, 4, 2 };
        int[] next_right = new int[n];
         
        // Initialising the tree
        List<Integer>[] tree = new ArrayList[4 * n];
        for (int i = 0; i < 4 * n; i++) {
            tree[i] = new ArrayList<Integer>();
        }
        Map<Integer, Integer> ump = new HashMap<Integer, Integer>();
         
        // Construction of next_right
        // array to store the
        // next index of occurrence
        // of elements
        for (int i = n - 1; i >= 0; i--) {
            if (ump.get(arr[i]) == null) {
                next_right[i] = n;
                ump.put(arr[i], i);
            }
            else {
                next_right[i] = ump.get(arr[i]);
                ump.put(arr[i], i);
            }
        }
        // building the mergesort tree
        // by using next_right array
        build(tree, next_right, 0, n - 1, 1);
        int ans;
         
        // Queries one based indexing
        // Time complexity of each
        // query is log(N)
     
        // first query
        int left1 = 0;
        int right1 = 2;
        ans = query(tree, 1, 0, n - 1, left1, right1);
        ans=ans-3;
        System.out.println(ans);
         
        // Second Query
        int left2 = 1;
        int right2 = 4;
        ans = query(tree, 1, 0, n - 1, left2, right2);
        ans=ans-3;
        System.out.println(ans);
    }
}
// This code is contributed by shiv1o43g

Python3

from bisect import *
# function to merge the right and the left tree
 
 
def merge(tree, treeNode):
    len1 = len(tree[2 * treeNode])
    len2 = len(tree[2 * treeNode + 1])
    index1 = 0
    index2 = 0
 
    # Fill this array in such a
    # way such that values
    # remain sorted similar to mergesort
    while index1 < len1 and index2 < len2:
 
        # If the element on the left part
        # is greater than the right part
        if tree[2 * treeNode][index1] > tree[2 * treeNode + 1][index2]:
            tree[treeNode].append(tree[2 * treeNode + 1][index2])
            index2 += 1
        else:
            tree[treeNode].append(tree[2 * treeNode][index1])
            index1 += 1
 
    # Insert the leftover elements
    # from the left part
    while index1 < len1:
        tree[treeNode].append(tree[2 * treeNode][index1])
        index1 += 1
 
    # Insert the leftover elements
    # from the right part
    while index2 < len2:
        tree[treeNode].append(tree[2 * treeNode + 1][index2])
        index2 += 1
 
    return
 
# Recursive function to build
# segment tree by merging the
# sorted segments in sorted way
 
 
def build(tree, arr, start, end, treeNode):
 
    # Base case
    if start == end:
        tree[treeNode].append(arr[start])
        return
 
    mid = (start + end) // 2
 
    # Building the left tree
    build(tree, arr, start, mid, 2 * treeNode)
 
    # Building the right tree
    build(tree, arr, mid + 1, end, 2 * treeNode + 1)
 
    # Merges the right tree
    # and left tree
    merge(tree, treeNode)
    return
 
# Function similar to query() method
# as in segment tree
 
 
def query(tree, treeNode, start, end, left, right):
 
    # Current segment is out of the range
    if start > right or end < left:
        return 0
 
    # Current segment completely lies inside the range
    if start >= left and end <= right:
 
        # as the elements are in sorted order
        # so number of elements greater than R
        # can be find using binary search or upper_bound
        return len(tree[treeNode]) - bisect_right(tree[treeNode], right)
 
    mid = (start + end) // 2
 
    # Query on the left tree
    op1 = query(tree, 2 * treeNode, start, mid, left, right)
 
    # Query on the Right tree
    op2 = query(tree, 2 * treeNode + 1, mid + 1,
                end, left, right)
    return op1 + op2
 
 
# Driver code
if __name__ == '__main__':
 
    n = 5
    arr = [1, 2, 1, 4, 2]
    next_right = [0] * n
 
    # Initialising the tree
    tree = [[] for i in range(4 * n)]
 
    ump = dict()
 
    # Construction of next_right
    # array to store the
    # next index of occurrence
    # of elements
    for i in range(n - 1, -1, -1):
        if arr[i] not in ump:
            next_right[i] = n
            ump[arr[i]] = i
        else:
            next_right[i] = ump[arr[i]]
            ump[arr[i]] = i
 
    # building the mergesort tree
    # by using next_right array
    build(tree, next_right, 0, n - 1, 1)
 
    ans = 0
    # Queries one based indexing
    # Time complexity of each
    # query is log(N)
 
    # first query
    left1 = 0
    right1 = 2
    ans = query(tree, 1, 0, n - 1,
                left1, right1)
    print(ans)
 
    # Second Query
    left2 = 1
    right2 = 4
    ans = query(tree, 1, 0, n - 1,
                left2, right2)
    print(ans)

C#

using System;
using System.Collections.Generic;
using System.Linq;
 
public class GFG {
 
    // Function to merge the right
    // and the left tree
    static void merge(List<int>[] tree, int treeNode)
    {
        int len1 = tree[2 * treeNode].Count();
        int len2 = tree[2 * treeNode + 1].Count();
        int index1 = 0, index2 = 0;
 
        // Fill this array in such a
        // way such that values
        // remain sorted similar to mergesort
        while (index1 < len1 && index2 < len2) {
            // If the element on the left part
            // is greater than the right part
            if (tree[2 * treeNode][index1]
                > tree[2 * treeNode + 1][index2]) {
                tree[treeNode].Add(
                    tree[2 * treeNode + 1][index2]);
                index2++;
            }
            else {
                tree[treeNode].Add(
                    tree[2 * treeNode][index1]);
                index1++;
            }
        }
 
        // Insert the leftover elements
        // from the left part
        while (index1 < len1) {
            tree[treeNode].Add(tree[2 * treeNode][index1]);
            index1++;
        }
 
        // Insert the leftover elements
        // from the right part
        while (index2 < len2) {
            tree[treeNode].Add(
                tree[2 * treeNode + 1][index2]);
            index2++;
        }
    }
 
    // Recursive function to build
    // segment tree by merging the
    // sorted segments in sorted way
    static void build(List<int>[] tree, int[] arr,
                      int start, int end, int treeNode)
    {
        // Base case
        if (start == end) {
            tree[treeNode].Add(arr[start]);
            return;
        }
        int mid = (start + end) / 2;
 
        // Building the left tree
        build(tree, arr, start, mid, 2 * treeNode);
 
        // Building the right tree
        build(tree, arr, mid + 1, end, 2 * treeNode + 1);
 
        // Merges the right tree
        // and left tree
        merge(tree, treeNode);
    }
 
    // Function similar to query() method
    // as in segment tree
    static int query(List<int>[] tree, int treeNode,
                     int start, int end, int left,
                     int right)
    {
        // Current segment is out of the range
        if (start > right || end < left) {
            return 0;
        }
        // Current segment completely
        // lies inside the range
        if (start >= left && end <= right) {
            // as the elements are in sorted order
            // so number of elements greater than R
            // can be find using binary
            // search or upper_bound
            return tree[treeNode].Count()
                - tree[treeNode].BinarySearch(
                    right, Comparer<int>.Create(
                               (x, y) = > x.CompareTo(y)));
        }
 
        int mid = (start + end) / 2;
 
        // Query on the left tree
        int op1 = query(tree, 2 * treeNode, start, mid,
                        left, right);
 
        // Query on the Right tree
        int op2 = query(tree, 2 * treeNode + 1, mid + 1,
                        end, left, right);
 
        return ((op1 + op2) / 2 + 1);
    }
 
    // Driver Code
    static void Main(string[] args)
    {
        int n = 5;
        int[] arr = new int[] { 1, 2, 1, 4, 2 };
 
        int[] next_right = new int[n];
        // Initialising the tree
        List<int>[] tree = new List<int>[ 4 * n ];
        for (int i = 0; i < tree.Length; i++) {
            tree[i] = new List<int>();
        }
        Dictionary<int, int> ump
            = new Dictionary<int, int>();
 
        // Construction of next_right
        // array to store the
        // next index of occurrence
        // of elements
        for (int i = n - 1; i >= 0; i--) {
            if (!ump.ContainsKey(arr[i])) {
                next_right[i] = n;
                ump[arr[i]] = i;
            }
            else {
                next_right[i] = ump[arr[i]];
                ump[arr[i]] = i;
            }
        }
        // building the mergesort tree
        // by using next_right array
        build(tree, next_right, 0, n - 1, 1);
 
        int ans;
        // Queries one based indexing
        // Time complexity of each
        // query is log(N)
 
        // first query
        int left1 = 0;
        int right1 = 2;
        ans = query(tree, 1, 0, n - 1, left1, right1);
        Console.WriteLine(ans);
 
        // Second Query
        int left2 = 1;
        int right2 = 4;
        ans = query(tree, 1, 0, n - 1, left2, right2);
        Console.WriteLine(ans);
    }
}

Javascript

// Define a function that takes an array and two indices as arguments
function countDistinctInRange(arr, left, right) {
  // Create an empty object to store unique elements and their counts
  let map = {};
  // Iterate over the elements in the given range of the array
  for (let i = left; i <= right; i++) {
    // If the current element is already in the map, increment its count
    if (arr[i] in map) {
      map[arr[i]] += 1;
    } else {
      // If the current element is not in the map, add it with a count of 1
      map[arr[i]] = 1;
    }
  }
  // Return the number of unique elements in the given range of the array
  return Object.keys(map).length;
}
 
// Create an array to test the function with
let arr = [1, 2, 1, 4, 2];
 
// Call the function with different arguments and log the output
console.log(countDistinctInRange(arr, 0, 2)); // Output: 2
console.log(countDistinctInRange(arr, 1, 3)); // Output: 3