Problem

Table: Numbers

+-------------+------+
| Column Name | Type |
+-------------+------+
| num         | int  |
| frequency   | int  |
+-------------+------+
num is the primary key (column with unique values) for this table.
Each row of this table shows the frequency of a number in the database.

The median is the value separating the higher half from the lower half of a data sample.

Write a solution to report the median of all the numbers in the database after decompressing the Numbers table. Round the median to one decimal point.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
Input: 
Numbers table:
+-----+-----------+
| num | frequency |
+-----+-----------+
| 0   | 7         |
| 1   | 1         |
| 2   | 3         |
| 3   | 1         |
+-----+-----------+
Output: 
+--------+
| median |
+--------+
| 0.0    |
+--------+
Explanation: 
If we decompress the Numbers table, we will get [0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3], so the median is (0 + 0) / 2 = 0.

Solution

Method 1 – Cumulative Frequency and Median Position

Intuition

To find the median from a frequency table, we expand the numbers according to their frequencies and find the middle value(s). We can do this efficiently by computing the cumulative frequency and locating the median position(s) without materializing the full list.

Approach

  1. Compute the total count of numbers as the sum of all frequencies.
  2. The median position is (total+1)//2 if odd, or the average of total//2 and total//2+1 if even.
  3. Sort the numbers by num ascending.
  4. Iterate through the sorted numbers, maintaining a running sum of frequencies.
  5. When the running sum reaches or passes the median position(s), record the corresponding number(s).
  6. If the total is odd, return the median number. If even, return the average of the two median numbers.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
WITH ordered AS (
  SELECT num, frequency, SUM(frequency) OVER (ORDER BY num) AS cum_freq
  FROM Numbers
),
params AS (
  SELECT SUM(frequency) AS total FROM Numbers
),
meds AS (
  SELECT num, cum_freq, LAG(cum_freq,1,0) OVER (ORDER BY num) AS prev_cum
  FROM ordered
)
SELECT ROUND(AVG(num), 1) AS median
FROM (
  SELECT num FROM meds, params
  WHERE (params.total % 2 = 1 AND cum_freq >= (params.total+1)/2 AND prev_cum < (params.total+1)/2)
     OR (params.total % 2 = 0 AND (
            (cum_freq >= params.total/2 AND prev_cum < params.total/2) OR
            (cum_freq >= params.total/2+1 AND prev_cum < params.total/2+1)
        ))
) t;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
WITH ordered AS (
  SELECT num, frequency, SUM(frequency) OVER (ORDER BY num) AS cum_freq
  FROM Numbers
),
params AS (
  SELECT SUM(frequency) AS total FROM Numbers
),
meds AS (
  SELECT num, cum_freq, LAG(cum_freq,1,0) OVER (ORDER BY num) AS prev_cum
  FROM ordered
)
SELECT ROUND(AVG(num)::numeric, 1) AS median
FROM (
  SELECT num FROM meds, params
  WHERE (params.total % 2 = 1 AND cum_freq >= (params.total+1)/2 AND prev_cum < (params.total+1)/2)
     OR (params.total % 2 = 0 AND (
            (cum_freq >= params.total/2 AND prev_cum < params.total/2) OR
            (cum_freq >= params.total/2+1 AND prev_cum < params.total/2+1)
        ))
) t;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import pandas as pd

def find_median_given_frequency(numbers: pd.DataFrame) -> float:
    df = numbers.sort_values('num')
    df['cum_freq'] = df['frequency'].cumsum()
    total = df['frequency'].sum()
    if total % 2 == 1:
        median_pos = (total + 1) // 2
        median = df.loc[df['cum_freq'] >= median_pos, 'num'].iloc[0]
        return float(median)
    else:
        pos1 = total // 2
        pos2 = pos1 + 1
        m1 = df.loc[df['cum_freq'] >= pos1, 'num'].iloc[0]
        m2 = df.loc[df['cum_freq'] >= pos2, 'num'].iloc[0]
        return (m1 + m2) / 2

Complexity

  • ⏰ Time complexity: O(n log n), where n is the number of rows in Numbers. Sorting and cumulative sum are O(n log n).
  • 🧺 Space complexity: O(n), for storing intermediate cumulative frequencies.