Problem

Given that integers are read from a data stream. Find median of elements read so for in efficient way.

OR

Numbers are randomly generated and passed to a method. Write a program to find and maintain the median value as new values are generated.

OR

You are given a stream of numbers which can be positive or negative. You are required to provide an operation FIND MEDIAN..which when invoked should be able return the median of the numbers in stream(in say O(1) time)

OR

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.

OR

Find the rolling median?

Median Definition

Median Definition

Examples

Example 1:

Input:
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[ [], [1], [2], [], [3], [] ]
Output:
[null, null, null, 1.5, null, 2.0]

Explanation:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

More Examples

For example, median of the stream, nums = [1, 5, 3, 2, 6, 2, 3] is = 3.

Note that we need to find the running median at any time of the stream. That is each time a new number shows up to the stream we need to update the median. For example,

A = [1], median = 1
A = [1,5], median = (5+1)/2 = 3
A = [1,5,3], median = 3
A = [1,5,3,2], median = (2+3)/2 = 2
A = [1,5,3,2,6], median = 3
A = [1,5,3,2,6,2], median = (2+3)/2 = 2
A = [1,5,3,2,6,2,3], median = 3

Observe that at a particular time if there are odd number of elements in then median is the middle element in the sorted order of the stream. That is half of the stream is less than the current median and half of them are greater than the current median. If there are even number of numbers then median is the average of middle two elements such that half of the numbers are less than the median and half of them are greater.

Solution

Method 1 - Using Sorted Array

The straightforward solution of keeping the list sorted takes constant time per median calculation but linear time per addition/deletion. It would be too slow for our purpose. Normal algorithm to sort is quicksort which takes O(n log n), but we can also try counting sort for integers which takes O(n) in terms of time. Then, this will give us A[n/2] as median.

Time Complexity: Get - O(1), Add and Delete - O(n)

Method 2 - Using Self-balancing BST

We can use 2 AVL trees to handle this. For each AVL tree node, track number of nodes within its respective subtree. Utilize an ordinary binary tree node as the root, where its left child constitutes a self-balancing BST holding values smaller than the root, and the right child a self-balancing BST with larger values.

The central root consistently signifies the median value.

If left and right subtrees contain same number of nodes, root node holds average of left and right subtree root data, otherwise, root holds same data as the root of subtree which is having more elements.

After processing an incoming element, the left and right subtrees (BST) are differed utmost by 1.

Self balancing BST is costly in managing balancing factor of BST. However, they provide sorted data which we don’t need. We need median only. Adding the n elements take O(n log n) time, and space complexity is also O(n). Also, rotations will also be costly affair.

Though getting the median takes O(1), as it is at the root of the tree.

Method 3 - Using 2 Heaps (Min and Max heaps) 🏆

We maintain two heaps:

  • Max-heap small has the smaller half of the numbers.
  • Min-heap large has the larger half of the numbers.

This gives me direct access to the one or two middle values (they’re the tops of the heaps), so getting the median takes O(1) time. And adding a number takes O(log n) time.

Code

Java
class MedianFinder {
	private PriorityQueue<Integer> maxHeap; // contains smaller values
	private PriorityQueue<Integer> minHeap; // contains larger value

	public MedianFinder() {
		maxHeap = new PriorityQueue<Integer> (Collections.reverseOrder());
		minHeap = new PriorityQueue<Integer> ();
	}

	// Adds a number into the data structure.
	public void addNum(int num) {
		maxHeap.offer(num);
        minHeap.offer(maxHeap.poll());
		if (minHeap.size() > maxHeap.size())
			maxHeap.offer(minHeap.poll());
		}
	}

	// Returns the median of current data stream
	public double findMedian() {
		if (maxHeap.size() == minHeap.size()) {
			return (double)(maxHeap.peek() + (minHeap.peek())) / 2;
		} else {
			return maxHeap.peek();
		}
	}
}

Method 4 - Using Skiplist and Queue

See the answer here.

Method 5 - P2 Algorithm

See the answer here.