+-------------+------+
| Column Name | Type |
+-------------+------+
| num | int |
| frequency | int |
+-------------+------+
num is the primary key (column with unique values) for this table.
Each row of this table shows the frequency of a number in the database.
The median is the value separating the higher half from the lower half of a data sample.
Write a solution to report the median of all the numbers in the database after decompressing the Numbers table. Round the median to one decimal point.
Input:
Numbers table:+-----+-----------+| num | frequency |+-----+-----------+|0|7||1|1||2|3||3|1|+-----+-----------+Output:
+--------+| median |+--------+|0.0|+--------+Explanation:
If we decompress the Numbers table, we will get [0,0,0,0,0,0,0,1,2,2,2,3], so the median is(0+0)/2=0.
To find the median from a frequency table, we expand the numbers according to their frequencies and find the middle value(s). We can do this efficiently by computing the cumulative frequency and locating the median position(s) without materializing the full list.
WITH ordered AS (
SELECT num, frequency, SUM(frequency) OVER (ORDERBY num) AS cum_freq
FROM Numbers
),
params AS (
SELECTSUM(frequency) AS total FROM Numbers
),
meds AS (
SELECT num, cum_freq, LAG(cum_freq,1,0) OVER (ORDERBY num) AS prev_cum
FROM ordered
)
SELECT ROUND(AVG(num), 1) AS median
FROM (
SELECT num FROM meds, params
WHERE (params.total %2=1AND cum_freq >= (params.total+1)/2AND prev_cum < (params.total+1)/2)
OR (params.total %2=0AND (
(cum_freq >= params.total/2AND prev_cum < params.total/2) OR (cum_freq >= params.total/2+1AND prev_cum < params.total/2+1)
))
) t;
WITH ordered AS (
SELECT num, frequency, SUM(frequency) OVER (ORDERBY num) AS cum_freq
FROM Numbers
),
params AS (
SELECTSUM(frequency) AS total FROM Numbers
),
meds AS (
SELECT num, cum_freq, LAG(cum_freq,1,0) OVER (ORDERBY num) AS prev_cum
FROM ordered
)
SELECT ROUND(AVG(num)::numeric, 1) AS median
FROM (
SELECT num FROM meds, params
WHERE (params.total %2=1AND cum_freq >= (params.total+1)/2AND prev_cum < (params.total+1)/2)
OR (params.total %2=0AND (
(cum_freq >= params.total/2AND prev_cum < params.total/2) OR (cum_freq >= params.total/2+1AND prev_cum < params.total/2+1)
))
) t;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import pandas as pd
deffind_median_given_frequency(numbers: pd.DataFrame) -> float:
df = numbers.sort_values('num')
df['cum_freq'] = df['frequency'].cumsum()
total = df['frequency'].sum()
if total %2==1:
median_pos = (total +1) //2 median = df.loc[df['cum_freq'] >= median_pos, 'num'].iloc[0]
return float(median)
else:
pos1 = total //2 pos2 = pos1 +1 m1 = df.loc[df['cum_freq'] >= pos1, 'num'].iloc[0]
m2 = df.loc[df['cum_freq'] >= pos2, 'num'].iloc[0]
return (m1 + m2) /2