# Median in a stream of integers (running integers)

Given that integers are read from a data stream. Find median of elements read so for in efficient way. For simplicity assume there are no duplicates. For example, let us consider the stream 5, 15, 1, 3 …

```After reading 1st element of stream - 5 -> median - 5
After reading 2nd element of stream - 5, 15 -> median - 10
After reading 3rd element of stream - 5, 15, 1 -> median - 5
After reading 4th element of stream - 5, 15, 1, 3 -> median - 4, so on...```

Making it clear, when the input size is odd, we take the middle element of sorted data. If the input size is even, we pick average of middle two elements in sorted stream.

Note that output is effective median of integers read from the stream so far. Such an algorithm is called online algorithm. Any algorithm that can guarantee output of i-elements after processing i-th element, is said to be online algorithm. Let us discuss three solutions for the above problem.

Method 1: Insertion Sort

If we can sort the data as it appears, we can easily locate median element. Insertion Sort is one such online algorithm that sorts the data appeared so far. At any instance of sorting, say after sorting i-th element, the first i elements of array are sorted. The insertion sort doesn’t depend on future data to sort data input till that point. In other words, insertion sort considers data sorted so far while inserting next element. This is the key part of insertion sort that makes it an online algorithm.

However, insertion sort takes O(n2) time to sort n elements. Perhaps we can use binary search on insertion sort to find location of next element in O(log n) time. Yet, we can’t do data movement in O(log n) time. No matter how efficient the implementation is, it takes polynomial time in case of insertion sort.

Interested reader can try implementation of Method 1.

Method 2: Augmented self balanced binary search tree (AVL, RB, etc…)

At every node of BST, maintain number of elements in the subtree rooted at that node. We can use a node as root of simple binary tree, whose left child is self balancing BST with elements less than root and right child is self balancing BST with elements greater than root. The root element always holds effective median.

If left and right subtrees contain same number of elements, root node holds average of left and right subtree root data. Otherwise, root contains same data as the root of subtree which is having more elements. After processing an incoming element, the left and right subtrees (BST) are differed utmost by 1.

Self balancing BST is costly in managing balancing factor of BST. However, they provide sorted data which we don’t need. We need median only. The next method make use of Heaps to trace median.

Method 3: Heaps

Similar to balancing BST in Method 2 above, we can use a max heap on left side to represent elements that are less than effective median, and a min heap on right side to represent elements that are greater than effective median.

After processing an incoming element, the number of elements in heaps differ utmost by 1 element. When both heaps contain same number of elements, we pick average of heaps root data as effective median. When the heaps are not balanced, we select effective median from the root of heap containing more elements.

Given below is implementation of above method. For algorithm to build these heaps, please read the highlighted code.

```#include <iostream>
using namespace std;

// Heap capacity
#define MAX_HEAP_SIZE (128)
#define ARRAY_SIZE(a) sizeof(a)/sizeof(a[0])

//// Utility functions

// exchange a and b
inline
void Exch(int &a, int &b)
{
int aux = a;
a = b;
b = aux;
}

// Greater and Smaller are used as comparators
bool Greater(int a, int b)
{
return a > b;
}

bool Smaller(int a, int b)
{
return a < b;
}

int Average(int a, int b)
{
return (a + b) / 2;
}

// Signum function
// = 0  if a == b  - heaps are balanced
// = -1 if a < b   - left contains less elements than right
// = 1  if a > b   - left contains more elements than right
int Signum(int a, int b)
{
if( a == b )
return 0;

return a < b ? -1 : 1;
}

// Heap implementation
// The functionality is embedded into
// Heap abstract class to avoid code duplication
class Heap
{
public:
// Initializes heap array and comparator required
// in heapification
Heap(int *b, bool (*c)(int, int)) : A(b), comp(c)
{
heapSize = -1;
}

// Frees up dynamic memory
virtual ~Heap()
{
if( A )
{
delete[] A;
}
}

// We need only these four interfaces of Heap ADT
virtual bool Insert(int e) = 0;
virtual int  GetTop() = 0;
virtual int  ExtractTop() = 0;
virtual int  GetCount() = 0;

protected:

// We are also using location 0 of array
int left(int i)
{
return 2 * i + 1;
}

int right(int i)
{
return 2 * (i + 1);
}

int parent(int i)
{
if( i <= 0 )
{
return -1;
}

return (i - 1)/2;
}

// Heap array
int   *A;
// Comparator
bool  (*comp)(int, int);
// Heap size
int   heapSize;

// Returns top element of heap data structure
int top(void)
{
int max = -1;

if( heapSize >= 0 )
{
max = A[0];
}

return max;
}

// Returns number of elements in heap
int count()
{
return heapSize + 1;
}

// Heapification
// Note that, for the current median tracing problem
// we need to heapify only towards root, always
void heapify(int i)
{
int p = parent(i);

// comp - differentiate MaxHeap and MinHeap
// percolates up
if( p >= 0 && comp(A[i], A[p]) )
{
Exch(A[i], A[p]);
heapify(p);
}
}

// Deletes root of heap
int deleteTop()
{
int del = -1;

if( heapSize > -1)
{
del = A[0];

Exch(A[0], A[heapSize]);
heapSize--;
heapify(parent(heapSize+1));
}

return del;
}

// Helper to insert key into Heap
bool insertHelper(int key)
{
bool ret = false;

if( heapSize < MAX_HEAP_SIZE )
{
ret = true;
heapSize++;
A[heapSize] = key;
heapify(heapSize);
}

return ret;
}
};

// Specilization of Heap to define MaxHeap
class MaxHeap : public Heap
{
private:

public:
MaxHeap() : Heap(new int[MAX_HEAP_SIZE], &Greater)  {  }

~MaxHeap()  { }

// Wrapper to return root of Max Heap
int GetTop()
{
}

// Wrapper to delete and return root of Max Heap
int ExtractTop()
{
return deleteTop();
}

// Wrapper to return # elements of Max Heap
int  GetCount()
{
return count();
}

// Wrapper to insert into Max Heap
bool Insert(int key)
{
return insertHelper(key);
}
};

// Specilization of Heap to define MinHeap
class MinHeap : public Heap
{
private:

public:

MinHeap() : Heap(new int[MAX_HEAP_SIZE], &Smaller) { }

~MinHeap() { }

// Wrapper to return root of Min Heap
int GetTop()
{
}

// Wrapper to delete and return root of Min Heap
int ExtractTop()
{
return deleteTop();
}

// Wrapper to return # elements of Min Heap
int  GetCount()
{
return count();
}

// Wrapper to insert into Min Heap
bool Insert(int key)
{
return insertHelper(key);
}
};

// Function implementing algorithm to find median so far.
int getMedian(int e, int &m, Heap &l, Heap &r)
{
// Are heaps balanced? If yes, sig will be 0
int sig = Signum(l.GetCount(), r.GetCount());
switch(sig)
{
case 1: // There are more elements in left (max) heap

if( e < m ) // current element fits in left (max) heap
{
// Remore top element from left heap and
// insert into right heap
r.Insert(l.ExtractTop());

// current element fits in left (max) heap
l.Insert(e);
}
else
{
// current element fits in right (min) heap
r.Insert(e);
}

// Both heaps are balanced
m = Average(l.GetTop(), r.GetTop());

break;

case 0: // The left and right heaps contain same number of elements

if( e < m ) // current element fits in left (max) heap
{
l.Insert(e);
m = l.GetTop();
}
else
{
// current element fits in right (min) heap
r.Insert(e);
m = r.GetTop();
}

break;

case -1: // There are more elements in right (min) heap

if( e < m ) // current element fits in left (max) heap
{
l.Insert(e);
}
else
{
// Remove top element from right heap and
// insert into left heap
l.Insert(r.ExtractTop());

// current element fits in right (min) heap
r.Insert(e);
}

// Both heaps are balanced
m = Average(l.GetTop(), r.GetTop());

break;
}

// No need to return, m already updated
return m;
}

void printMedian(int A[], int size)
{
int m = 0; // effective median
Heap *left  = new MaxHeap();
Heap *right = new MinHeap();

for(int i = 0; i < size; i++)
{
m = getMedian(A[i], m, *left, *right);

cout << m << endl;
}

// C++ more flexible, ensure no leaks
delete left;
delete right;
}
// Driver code
int main()
{
int A[] = {5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4};
int size = ARRAY_SIZE(A);

// In lieu of A, we can also use data read from a stream
printMedian(A, size);

return 0;
}
```

Time Complexity: If we omit the way how stream was read, complexity of median finding is O(N log N), as we need to read the stream, and due to heap insertions/deletions.

At first glance the above code may look complex. If you read the code carefully, it is simple algorithm.

# Company Wise Coding Practice    Topic Wise Coding Practice

Software Engineer

• jugal

@venki : I have a soution with O(n) time complexity.
We can do it with doubly link list and need to maintain it sorted and maintain one boolean flag to keep track of even or odd number of elements in to the list.
flag = 0 means even entries and 1 for odd entries.

Algorithm
a) insert new element in to the list at appropriate location.
b) We know the previous median
flag = ~flag

if (flag == 0)
{
{
new_median = (address_previous_median + next_element) / 2;
strore these two addresses in variable
}
else
{
new_median = (address_previous_median + previous_element) / 2;
strore these two addresses in variable
}
}
else
{
//do same as above but take care if new element is inserted in between two median.
}

plz give your feedback whether i m right or wrong. According to me it's time complexity is O(n). In the below comment i have uploaded my code and it's working fine.

• Guest

#include
#include

using namespace std;

struct node
{ int data;
struct node* next;
struct node* prev;
};

struct node* m1 = NULL;
int count = 0;

void print_list() {
while(temp != NULL)
{ cout <data <“;
temp = temp->next; }
cout <data;
}else
{ if(count & 1)
{ if(m1->data data)
m1 = m1->next;
return m1->data;
}else
{ if(m1->data > node->data)
m1 = m1->prev;
return (m1->data + m1->next->data)/2;
}
}
}

void push(int data)
{
struct node* temp = (struct node*)malloc(sizeof(struct node));
temp->data = data;
temp->next = NULL;
temp->prev = NULL;
else
{
struct node* previous = NULL;
while(current != NULL && current->data next;
}
{
}
else if(current==NULL)
{
temp->prev = previous;
previous->next = temp;
}
else
{
temp->prev = previous;
temp->next = current;
current->prev = temp;
previous->next = temp;
}
}
print_list();
cout << median(temp) << endl;
}

int main()
{
int a[] = {5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4};
int n = sizeof(a)/sizeof(a[0]);

for(int i=0;i<n;i++)
push(a[i]);
}

• jugal

@venki : here is my code using doubly linked list with O(n) complexity. It is also working for duplicates.

#include
#include

using namespace std;

struct node
{ int data;
struct node* next;
struct node* prev;
};

struct node* m1 = NULL;
int count = 0;

void print_list() {
while(temp != NULL)
{ cout <data <“;
temp = temp->next; }
cout <data;
}else
{ if(count & 1)
{ if(m1->data data)
m1 = m1->next;
return m1->data;
}else
{ if(m1->data > node->data)
m1 = m1->prev;
return (m1->data + m1->next->data)/2;
}
}
}

void push(int data)
{
struct node* temp = (struct node*)malloc(sizeof(struct node));
temp->data = data;
temp->next = NULL;
temp->prev = NULL;
else
{
struct node* previous = NULL;
while(current != NULL && current->data next;
}
{
}
else if(current==NULL)
{
temp->prev = previous;
previous->next = temp;
}
else
{
temp->prev = previous;
temp->next = current;
current->prev = temp;
previous->next = temp;
}
}
print_list();
cout << median(temp) << endl;
}

int main()
{
int a[] = {5, 5, 3, 2, 8, 7, 9, 10, 6, 11, 4};
int n = sizeof(a)/sizeof(a[0]);

for(int i=0;i<n;i++)
push(a[i]);}

• alien

awesome algorithm.. can it be done inplace with some other algorithm?

• Muthukumar Suresh

hey correct me if I am wrong . in method 3, when you are inserting the method seems fine .
But in the case of extracting max or min , the last element is placed in A[0] . the problem with this algo is that this element in A[0] needs to be sifted down to its proper position . Since this algo of heapify propagates upward . that cant be done . We need to sift down like in a normal heapify procedure

• Rahul Singh

Yeah you are right. The heapify for deleteTop needs to sift down the new root.

``` ```
/* Paste your code here (You may delete these lines if not writing code) */
``` ```
• Paparao Veeragandham

Method2 && Method3 does not work :

Example:
Max Heap has = 94,90
Min Heap has = 100
Median = 97

If incoming Element is 96

case 1: if ( 96 < 97 )
//remove top element from Max Heap & insert into MinHeap

Results : Max Heap has : 90
Min Heap has : 94,100
// Insert 96 into Max heap

Results : Max Heap has : 96,90
Min Heap has : 94,100
It is wrong According to Solution description.
Because all elements of MaxHeap are lesser-than MinHeap

Same Problem present for Method2 also.

Correct me if i was wrong…………..

/* Paste your code here (You may delete these lines if not writing code) */

• Abhishek

Look carefully Initially when only 3 elements have been added to the heaps, the median is not 97 as u mentioned. It is 94. So when 96 comes it needs to be added to the min heap. After this operation the median will become 95.

• abhishek08aug

Intelligent 😀

``` ```
/* Paste your code here (You may delete these lines if not writing code) */
``` ```
• sachin

Venki,
How can it be done using Balanced BST?I dont think the root will always hold the median.However,the balance factor can be the size of the subtree if the root will hold the median.

``` ```
/* Paste your code here (You may delete these lines if not writing code) */
``` ```
• Rahul Singh

The deleteTop() function in Method 3 is wrong. I case of deletion we need to sift down from the root after exchanging the root with the last element and decrementing the heapSize by 1. To prove my point consider the test case where the heap is {70, 60, 42, 50, 51, 32, 23, 35, 20, 10, 40, 5, 4}. The given deleteTop() method will not run correctly.

``` ```
/* Paste your code here (You may delete these lines if not writing code) */
``` ```
• Neha

IMO that sol doesn’t handle the constraint that it is a dynamic array, am I missing something?

• Neha

IMO that sol doesn’t handle the constraint that it is a dynamic array, am I missing something?

• gaurav

there is one problem in 3rd solution , in case of delete , argument in heapify function seems wrong , argument while deleting in heap should always be 0 , as we have to place root (which was last number in heap ) at correct position……i am talking about max heap ……sol given here will again put root element at root

• Paparao Veeragandham
``` ```
/* Paste your code here (You may delete these lines if not writing code) */
``` ```

In Method2 : No need to maintaing the elements.

After inserting an element into balanced BST.
Find out the left subtree height & Right subtree height
Depends on it we can take decision about median elements

• Sreenivas Doosa

@Venki

The solution using Heap is really awesome and gr8 explanation.
Keep posting duuuude

• Bhavesh

ans appears to be wrong when inserting 11

• kartik

I tried running the code for your input and got the following output
5
10
5
4
3
4
5
6
7
8
The output looks correct. Could you post the complete code that you tried?

• BlackMath

There is some problem in method3.

``` ```
/* Paste your code here (You may delete these lines if not writing code) */
``` ```
• kartik

Please provide more details of your comment. Why do you think that the code will not work? Do you have some sample input for which it didn’t work?

• par

3rd solution won’t work in case of -ve values, you would need to change the Heapify() to check for +ve and -ve values.

• Karthick

A very clever method using heaps. Nice explanation.
But, I felt that it would have been more easily understandable, if the writer of the code had avoided syntaxes that are language specific(eg: using virtual) so that people without the knowledge of the implemented language can understand the code.

• Thanks @Karthick. The idea is to provide simple algorithm in OOP way. Basic C++ skills will be sufficient to follow the code.

• Akshay

• Can you please gimme the psudocode for method 2. I am fully confused about how it can be implememnted !!

• @Vinay, thanks for pointing this. I guess there seems to be some mistake in method 2. I will update the content. sorry for delay.

• @Vinay . We Can Think Like , insert the elements into bst as they arrives & calculate the tree size after processing ith element . now do inorder traversal till x=size/2 , return root as median if x is odd
else return left subtree root + root /2 isn’t it ?

its just thought , may not work for all cases , but cover basic idea ?

• kiak

what is x here ? The explaination is confusing too. Please explain clearly if you can.

• @Kiak X is just a varible thats holding the value calculated by size/2 .

• Manohar Singh

This can also be done in O(n). Maintain three variables lower,middle and upper. While scanning the inputs keep updating these variables. If n is even median is avg of upper and lower ,otherwise middle is the median.

• @Manohar Singh, The above algorithm is doing same. As per your terminology, lower and upper are maintained in Maxheap and Minheap respectively. Yet the complexity of tracing lower and upper is O(log N) due to heap insertions/deletions.

Could you please suggest any other better data structure to organize lower half and upper half?

• @Venki 3rd one , Thats NIce Explanation Keep it UP

``` ```