Smallest subarray with k distinct numbers

We are given an array a consisting of n integers and an integer k. We need to find minimum range in array [l, r] (both l and r are inclusive) such that there are exactly k different numbers.

Examples:

Input : arr[] = { 1, 1, 2, 2, 3, 3, 4, 5} 
            k = 3
Output : 5 7

Input : arr[] = { 1, 2, 2, 3} 
            k = 2
Output : 0 1



A simple solution is to use two nested loops. The outer loop is used to pick a starting point and inner loop is used to pick an ending point. For every pair of starting-ending points, we count distinct elements in it and update result if current window is smaller. We use hashing to count distinct elements in a range.

// CPP program to find minimum range that
// contains exactly k distinct numbers.
#include <bits/stdc++.h>
using namespace std;

// Prints the minimum range that contains exactly
// k distinct numbers.
void minRange(int arr[], int n, int k)
{
    int l = 0, r = n;

    // Consider every element as starting
    // point.
    for (int i = 0; i < n; i++) {

        // Find the smallest window starting
        // with arr[i] and containing exactly
        // k distinct elements.
        unordered_set<int> s;
        int j;
        for (j = i; j < n; j++) {
            s.insert(arr[j]);
            if (s.size() == k) {
                if ((j - i) < (r - l)) {
                    r = j;
                    l = i;
                }
                break;
            }
        }

        // There are less than k distinct elements
        // now, so no need to continue.
        if (j == n)
            break;
    }

    // If there was no window with k distinct
    // elements (k is greater than total distinct
    // elements)
    if (l == 0 && r == n)
        cout << "Invalid k";
    else
        cout << l << " " << r;
}

// Driver code for above function.
int main()
{
    int arr[] = { 1, 2, 3, 4, 5 };
    int n = sizeof(arr) / sizeof(arr[0]);
    int k = 3;
    minRange(arr, n, k);
    return 0;
}

Time Complexity : O(n2)
 

Optimization over above simple solution. The idea is to remove repetitions on left side after we find k distinct elements.

// CPP program to find minimum range that
// contains exactly k distinct numbers.
#include <bits/stdc++.h>
using namespace std;

// prints the minimum range that contains exactly
// k distinct numbers.
void minRange(int arr[], int n, int k)
{
    // Initially left and right side is -1 and -1,
    // number of distinct elements are zero and
    // range is n.
    int l = 0, r = n;

    int j = -1; // Initialize right side
    map<int, int> hm;
    for (int i=0; i<n; i++)
    {
        while (j < n)
        {
            // increment right side.
            j++;

            // if number of distinct elements less
            // than k.
            if (hm.size() < k)
                hm[arr[j]]++;

            // if distinct elements are equal to k
            // and length is less than previous length.
            if (hm.size() == k && ((r - l) >= (j - i)))
            {
                l = i;
                r = j;
                break;
            }
        }

        // if number of distinct elements less
        // than k, then break.
        if (hm.size() < k)
            break;

        // if distinct elements equals to k then
        // try to increment left side.
        while (hm.size() == k)
        {

            if (hm[arr[i]] == 1)
                hm.erase(arr[i]);
            else
                hm[arr[i]]--;

            // increment left side.
            i++;

            // it is same as explained in above loop.
            if (hm.size() == k && (r - l) >= (j - i))
            {
                l = i;
                r = j;
            }
        }
        if (hm[arr[i]] == 1)
            hm.erase(arr[i]);
        else
            hm[arr[i]]--;
    }

    if (l == 0 && r == n)
        cout << "Invalid k" << endl;
    else
        cout << l << " " << r << endl;
}

// Driver code for above function.
int main()
{
    int arr[] = { 1, 1, 2, 2, 3, 3, 4, 5 };
    int n = sizeof(arr) / sizeof(arr[0]);
    int k = 3;
    minRange(arr, n, k);
    return 0;
}

Output:

5 7

Time complexity of this solution is O(n). In every nested iteration, we either add an element or remove an element. Every element is inserted and removed at most once.

This article is contributed by Pawan Asipu. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.






Practice Tags :

Recommended Posts:



3 Average Difficulty : 3/5.0
Based on 3 vote(s)