Distinct elements in subarray using Mo’s Algorithm

Given an array ‘a[]’ of size n and number of queries q. Each query can be represented by two integers l and r. Your task is to print the number of distinct integers in the subarray l to r.
Given a[i] <= 10^6
Examples :

Input : a[] = {1, 1, 2, 1, 2, 3}
        q = 3
        0 4
        1 3
        2 5
Output : 2
         2
         3
In query 1, number of distinct integers
in a[0...4] is 2 (1, 2)
In query 2, number of distinct integers 
in a[1..3] is 2 (1, 2)
In query 3, number of distinct integers 
in a[2..5] is 3 (1, 2, 3)

Input : a[] = {7, 3, 5, 9, 7, 6, 4, 3, 2}
        q = 4
        1 5
        0 4
        0 7
        1 8
output : 5
         4
         6
         7

Let a[0…n-1] be input array and q[0..m-1] be array of queries.
Approach :

  1. Sort all queries in a way that queries with L values from 0 to \sqrt(n) - 1 are put together, then all queries from \sqrt(n) to 2*\sqrt(n) - 1, and so on. All queries within a block are sorted in increasing order of R values.
  2. Initialize an array freq[] of size 10^6 with 0 . freq[] array keep count of frequencies of all the elements in lying in a given range.
  3. Process all queries one by one in a way that every query uses number of different elements and frequency array computed in previous query and stores the result in structure.
    • Let ‘curr_Diff_element’ be number of different elements of previous query.
    • Remove extra elements of previous query. For example if previous query is [0, 8] and current query is [3, 9], then remove a[0], a[1] and a[2]
    • Add new elements of current query. In the same example as above, add a[9].
  4. Sort the queries in the same order as they were provided earlier and print their stored results

Adding elements()

  • Increase the frequency of element to be added(freq[a[i]]) by 1.
  • If frequency of element a[i] is 1.Increase curr_diff_element by 1 as 1 new element has been added in range.

Removing elements()

  • Decrease frequency of element to be removed (a[i]) by 1.
  • if frequency of an element a[i] is 0.Just decrease curr_diff_element by 1 as 1 element has been completely removed from the range.

Note : In this algorithm, in step 2, index variable for R change at most O(n * \sqrt(n)) times throughout the run and same for L changes its value at most O(m * \sqrt(n)) times. All these bounds are possible only because sorted queries first in blocks of \sqrt(n) size.



The preprocessing part takes O(m Log m) time.

Processing all queries takes O(n * \sqrt(n)) + O(m * \sqrt(n)) = O((m+n) *\sqrt(n)) time.
Below is the implementation of above approach :

filter_none

edit
close

play_arrow

link
brightness_4
code

// Program to compute no. of different elements
// of ranges for different range queries
#include <bits/stdc++.h>
using namespace std;
  
// Used in frequency array (maximum value of an
// array element).
const int MAX = 1000000;
  
// Variable to represent block size. This is made
// global so compare() of sort can use it.
int block;
  
// Structure to represent a query range and to store
// index and result of a particular query range
struct Query {
    int L, R, index, result;
};
  
// Function used to sort all queries so that all queries
// of same block are arranged together and within a block,
// queries are sorted in increasing order of R values.
bool compare(Query x, Query y)
{
    // Different blocks, sort by block.
    if (x.L / block != y.L / block)
        return x.L / block < y.L / block;
  
    // Same block, sort by R value
    return x.R < y.R;
}
  
// Function used to sort all queries in order of their
// index value so that results of queries can be printed
// in same order as of input
bool compare1(Query x, Query y)
{
    return x.index < y.index;
}
  
// calculate distinct elements of all query ranges.
// m is number of queries n is size of array a[].
void queryResults(int a[], int n, Query q[], int m)
{
    // Find block size
    block = (int)sqrt(n);
  
    // Sort all queries so that queries of same
    // blocks are arranged together.
    sort(q, q + m, compare);
  
    // Initialize current L, current R and current
    // different elements
    int currL = 0, currR = 0;
    int curr_Diff_elements = 0;
  
    // Initialize frequency array with 0
    int freq[MAX] = { 0 };
  
    // Traverse through all queries
    for (int i = 0; i < m; i++) {
          
        // L and R values of current range
        int L = q[i].L, R = q[i].R;
  
        // Remove extra elements of previous range.
        // For example if previous range is [0, 3]
        // and current range is [2, 5], then a[0] 
        // and a[1] are subtracted
        while (currL < L) {
              
            // element a[currL] is removed
            freq[a[currL]]--;
            if (freq[a[currL]] == 0) 
                curr_Diff_elements--;
              
            currL++;
        }
  
        // Add Elements of current Range
        // Note:- during addition of the left
        // side elements we have to add currL-1
        // because currL is already in range
        while (currL > L) {
            freq[a[currL - 1]]++;
  
            // include a element if it occurs first time
            if (freq[a[currL - 1]] == 1) 
                curr_Diff_elements++;
              
            currL--;
        }
        while (currR <= R) {
            freq[a[currR]]++;
  
            // include a element if it occurs first time
            if (freq[a[currR]] == 1) 
                curr_Diff_elements++;
              
            currR++;
        }
  
        // Remove elements of previous range. For example
        // when previous range is [0, 10] and current range
        // is [3, 8], then a[9] and a[10] are subtracted
        // Note:- Basically for a previous query L to R
        // currL is L and currR is R+1. So during removal
        // of currR remove currR-1 because currR was
        // never included
        while (currR > R + 1) {
  
            // element a[currL] is removed
            freq[a[currR - 1]]--;
  
            // if ocurrence of a number is reduced
            // to zero remove it from list of 
            // different elements
            if (freq[a[currR - 1]] == 0) 
                curr_Diff_elements--;
              
            currR--;
        }
        q[i].result = curr_Diff_elements;
    }
}
  
// print the result of all range queries in
// initial order of queries
void printResults(Query q[], int m)
{
    sort(q, q + m, compare1);
    for (int i = 0; i < m; i++) {
        cout << "Number of different elements" << 
               " in range " << q[i].L << " to " 
             << q[i].R << " are " << q[i].result << endl;
    }
}
  
// Driver program
int main()
{
    int a[] = { 1, 1, 2, 1, 3, 4, 5, 2, 8 };
    int n = sizeof(a) / sizeof(a[0]);
    Query q[] = { { 0, 4, 0, 0 }, { 1, 3, 1, 0 },
                  { 2, 4, 2, 0 } };
    int m = sizeof(q) / sizeof(q[0]);
    queryResults(a, n, q, m);
    printResults(q, m);
    return 0;
}

chevron_right


Output:

Number of different elements in range 0 to 4 are 3
Number of different elements in range 1 to 3 are 2
Number of different elements in range 2 to 4 are 3


My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.