Find Median Given Frequency of Numbers
HardUpdated: Aug 2, 2025
Practice on:
Problem
Table: Numbers
+-------------+------+
| Column Name | Type |
+-------------+------+
| num | int |
| frequency | int |
+-------------+------+
num is the primary key (column with unique values) for this table.
Each row of this table shows the frequency of a number in the database.
The median is the value separating the higher half from the lower half of a data sample.
Write a solution to report the median of all the numbers in the database after decompressing the Numbers table. Round the median to one decimal point.
The result format is in the following example.
Examples
Example 1:
Input:
Numbers table:
+-----+-----------+
| num | frequency |
+-----+-----------+
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+-----+-----------+
Output:
+--------+
| median |
+--------+
| 0.0 |
+--------+
Explanation:
If we decompress the Numbers table, we will get [0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3], so the median is (0 + 0) / 2 = 0.
Solution
Method 1 – Cumulative Frequency and Median Position
Intuition
To find the median from a frequency table, we expand the numbers according to their frequencies and find the middle value(s). We can do this efficiently by computing the cumulative frequency and locating the median position(s) without materializing the full list.
Approach
- Compute the total count of numbers as the sum of all frequencies.
- The median position is (total+1)//2 if odd, or the average of total//2 and total//2+1 if even.
- Sort the numbers by num ascending.
- Iterate through the sorted numbers, maintaining a running sum of frequencies.
- When the running sum reaches or passes the median position(s), record the corresponding number(s).
- If the total is odd, return the median number. If even, return the average of the two median numbers.
Code
MySQL
WITH ordered AS (
SELECT num, frequency, SUM(frequency) OVER (ORDER BY num) AS cum_freq
FROM Numbers
),
params AS (
SELECT SUM(frequency) AS total FROM Numbers
),
meds AS (
SELECT num, cum_freq, LAG(cum_freq,1,0) OVER (ORDER BY num) AS prev_cum
FROM ordered
)
SELECT ROUND(AVG(num), 1) AS median
FROM (
SELECT num FROM meds, params
WHERE (params.total % 2 = 1 AND cum_freq >= (params.total+1)/2 AND prev_cum < (params.total+1)/2)
OR (params.total % 2 = 0 AND (
(cum_freq >= params.total/2 AND prev_cum < params.total/2) OR
(cum_freq >= params.total/2+1 AND prev_cum < params.total/2+1)
))
) t;
PostgreSQL
WITH ordered AS (
SELECT num, frequency, SUM(frequency) OVER (ORDER BY num) AS cum_freq
FROM Numbers
),
params AS (
SELECT SUM(frequency) AS total FROM Numbers
),
meds AS (
SELECT num, cum_freq, LAG(cum_freq,1,0) OVER (ORDER BY num) AS prev_cum
FROM ordered
)
SELECT ROUND(AVG(num)::numeric, 1) AS median
FROM (
SELECT num FROM meds, params
WHERE (params.total % 2 = 1 AND cum_freq >= (params.total+1)/2 AND prev_cum < (params.total+1)/2)
OR (params.total % 2 = 0 AND (
(cum_freq >= params.total/2 AND prev_cum < params.total/2) OR
(cum_freq >= params.total/2+1 AND prev_cum < params.total/2+1)
))
) t;
Python (pandas)
import pandas as pd
def find_median_given_frequency(numbers: pd.DataFrame) -> float:
df = numbers.sort_values('num')
df['cum_freq'] = df['frequency'].cumsum()
total = df['frequency'].sum()
if total % 2 == 1:
median_pos = (total + 1) // 2
median = df.loc[df['cum_freq'] >= median_pos, 'num'].iloc[0]
return float(median)
else:
pos1 = total // 2
pos2 = pos1 + 1
m1 = df.loc[df['cum_freq'] >= pos1, 'num'].iloc[0]
m2 = df.loc[df['cum_freq'] >= pos2, 'num'].iloc[0]
return (m1 + m2) / 2
Complexity
- ⏰ Time complexity:
O(n log n), where n is the number of rows in Numbers. Sorting and cumulative sum are O(n log n). - 🧺 Space complexity:
O(n), for storing intermediate cumulative frequencies.