Median of Stream of Running Integers using STL

Given that integers are being read from a data stream. Find median of all the elements read so far starting from the first integer till the last integer. This is also called Median of Running Integers. The data stream can be any source of data, example: a file, an array of integers, input stream etc.

What is Median?

Median can be defined as the element in the data set which separates the higher half of the data sample from the lower half. In other words we can get the median element as, when the input size is odd, we take the middle element of sorted data. If the input size is even, we pick average of middle two elements in sorted stream.

Example:

Input: 5 10 15
Output: 5
        7.5
        10

Explanation: Given the input stream as an array of integers [5,10,15]. We will now read integers one by one and print the median correspondingly. So, after reading first element 5,median is 5. After reading 10,median is 7.5 After reading 15 ,median is 10.

The idea is to use max heap and min heap to store the elements of higher half and lower half. Max heap and min heap can be implemented using priority_queue in C++ STL. Below is the step by step algorithm to solve this problem.
Algorithm:

  1. Create two heaps. One max heap to maintain elements of lower half and one min heap to maintain elements of higher half at any point of time..
  2. Take initial value of median as 0.
  3. For every newly read element, insert it into either max heap or min heap and calulate the median based on the following conditions:
    • If the size of max heap is greater than size of min heap and the element is less than previous median then pop the top element from max heap and insert into min heap and insert the new element to max heap else insert the new element to min heap. Calculate the new median as average of top of elements of both max and min heap.
    • If the size of max heap is less than size of min heap and the element is greater than previous median then pop the top element from min heap and insert into max heap and insert the new element to min heap else insert the new element to max heap. Calculate the new median as average of top of elements of both max and min heap.
    • If the size of both heaps are same. Then check if current is less than previous median or not. If the current element is less than previous median then insert it to max heap and new median will be equal to top element of max heap. If the current element is greater than previous median then insert it to min heap and new median will be equal to top element of min heap.

Below is C++ implementation of above approach:

// C++ program to find median in 
// stream of running integers

#include<bits/stdc++.h>
#include<iomanip>
using namespace std;

// max heap to store the higher half elements 
priority_queue<int> max_heap_left;

// min heap to store the lower half elements
priority_queue<int,vector<int>,greater<int>> min_heap_right;

// function to calculate median of stream 
void calculate_median(double x,double &median)
{
    /*  At any time we try to make heaps balanced and 
        their sizes differ by atmost 1. If heaps are 
        balanced,then we declare median as average of 
        min_heap_right.top() and max_heap_left.top()
        If heaps are unbalanced,then median is defined 
        as the top element of heap of larger size  */
     
    // case1(left side heap has more elements)
    if (max_heap_left.size() > min_heap_right.size())
    {
        if (x < median)
        {
            min_heap_right.push(max_heap_left.top());
            max_heap_left.pop();
            max_heap_left.push(x);
        }
        else
            min_heap_right.push(x);

        median = ((double)max_heap_left.top()
                +(double)min_heap_right.top())/2.0;
    }

    // case2(both heaps are balanced)
    else if (max_heap_left.size()==min_heap_right.size())
    {
        if (x < median)
        {
            max_heap_left.push(x);
            median = (double)max_heap_left.top();
        }
        else
        {
            min_heap_right.push(x);
            median = (double)min_heap_right.top();
        }
    }

    // case3(right side heap has more elements)
    else
    {
        if (x > median)
        {
            max_heap_left.push(min_heap_right.top());
            min_heap_right.pop();
            min_heap_right.push(x);
        }
        else
            max_heap_left.push(x);

        median = ((double)max_heap_left.top()
                 + (double)min_heap_right.top())/2.0;
    }
}

// Driver program to test above functions
int main()
{   
    // stream of integers
    double arr[] = {5, 15, 10, 20, 3};
    double median = 0;//stores the median 
    
    // size of stream
    int n = 5;
    
    // reading elements of stream one by one
    for (int i=0; i < n; i++)
    {   
        // calculating new median for each 
        // new element added to the stream
        calculate_median(arr[i], median);
        cout << setprecision(1) << fixed << median << "n";
    }
    return 0;
}

Output:

5.0
10.0
10.0
12.5
10.0

Time Complexity: O(n Log n)
Auxiliary Space : O(n)

This article is contributed by Vibhu Garg. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

GATE CS Corner    Company Wise Coding Practice

Recommended Posts:







Writing code in comment? Please use ide.geeksforgeeks.org, generate link and share the link here.