Problem
Describe an algorithm to find the largest 1 million numbers in 1 billion numbers. Assume that the computer memory can hold all one billion numbers.
Input
- A stream of integers
nums[]
of length 1 Billion - An integer M, at max M = 1 Million, representing the number of top largest elements to retrieve.
Examples
Example 1
|
|
Solutions
Method 1 – Full Sorting
Intuition
If we sort all the numbers, the largest M elements will always be at the end of the sorted array. This is a direct and reliable way to get the top M largest numbers.
Approach
- Sort the entire array of N numbers in ascending order.
- After sorting, select the last M elements from the array as the result.
- Return these M elements as the largest numbers (order can be ascending or descending as needed).
Complexity
- ⏰ Time complexity:
O(N log N)
, because sorting the entire array of N elements dominates the runtime. - 🧺 Space complexity:
O(1)
(if sorting in-place), orO(N)
if a new array is created for the result.
Method 2 – Quickselect/Partial Partitioning
Intuition
We don’t need to fully sort the array to find the largest M numbers. By using a partitioning approach inspired by Quicksort, we can efficiently separate the largest M elements from the rest, without caring about their order.
Approach
- Use a partition function (like in Quicksort) to split the array around a pivot.
- After partitioning, count the number of elements greater than or equal to the pivot (right partition).
- If this count is exactly M, return these elements.
- If the count is more than M, recursively partition the right side.
- If the count is less than M, collect all from the right and recursively find the remaining (M - count) from the left.
- This process continues until exactly M largest elements are found.
Code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Complexity
- ⏰ Time complexity:
O(N)
, since each partition step reduces the problem size and the expected number of steps is linear. - 🧺 Space complexity:
O(1)
(in-place partitioning, ignoring recursion stack and result array).
Java code
|
|
Method 3 – Min-Heap (Priority Queue)
Intuition
We can maintain a collection of the largest M numbers seen so far by using a min-heap of size M. The heap always contains the current M largest elements, with the smallest at the top. Any new number larger than the heap’s minimum replaces it, ensuring only the largest M remain.
Approach
- Initialize a min-heap (priority queue) of size at most M.
- Iterate through each number in the array:
- If the heap has fewer than M elements, add the number.
- If the heap is full and the current number is larger than the smallest in the heap, replace the smallest.
- After processing all numbers, the heap contains the largest M numbers (in any order).
Code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Complexity
- ⏰ Time complexity:
O(N log M)
, since each insertion or replacement in the heap of size M takesO(log M)
and we process N elements. - 🧺 Space complexity:
O(M)
, for the heap storing the largest M elements.