Problem

Table: Employee

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| id           | int     |
| company      | varchar |
| salary       | int     |
+--------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the company and the salary of one employee.

Write a solution to find the rows that contain the median salary of each company. While calculating the median, when you sort the salaries of the company, break the ties by id.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Input: 
Employee table:
+----+---------+--------+
| id | company | salary |
+----+---------+--------+
| 1  | A       | 2341   |
| 2  | A       | 341    |
| 3  | A       | 15     |
| 4  | A       | 15314  |
| 5  | A       | 451    |
| 6  | A       | 513    |
| 7  | B       | 15     |
| 8  | B       | 13     |
| 9  | B       | 1154   |
| 10 | B       | 1345   |
| 11 | B       | 1221   |
| 12 | B       | 234    |
| 13 | C       | 2345   |
| 14 | C       | 2645   |
| 15 | C       | 2645   |
| 16 | C       | 2652   |
| 17 | C       | 65     |
+----+---------+--------+
Output: 
+----+---------+--------+
| id | company | salary |
+----+---------+--------+
| 5  | A       | 451    |
| 6  | A       | 513    |
| 12 | B       | 234    |
| 9  | B       | 1154   |
| 14 | C       | 2645   |
+----+---------+--------+
Explanation: 
For company A, the rows sorted are as follows:
+----+---------+--------+
| id | company | salary |
+----+---------+--------+
| 3  | A       | 15     |
| 2  | A       | 341    |
| 5  | A       | 451    | <-- median
| 6  | A       | 513    | <-- median
| 1  | A       | 2341   |
| 4  | A       | 15314  |
+----+---------+--------+
For company B, the rows sorted are as follows:
+----+---------+--------+
| id | company | salary |
+----+---------+--------+
| 8  | B       | 13     |
| 7  | B       | 15     |
| 12 | B       | 234    | <-- median
| 11 | B       | 1221   | <-- median
| 9  | B       | 1154   |
| 10 | B       | 1345   |
+----+---------+--------+
For company C, the rows sorted are as follows:
+----+---------+--------+
| id | company | salary |
+----+---------+--------+
| 17 | C       | 65     |
| 13 | C       | 2345   |
| 14 | C       | 2645   | <-- median
| 15 | C       | 2645   | 
| 16 | C       | 2652   |
+----+---------+--------+
**Follow up:** Could you solve it without using any built-in or window
functions?

Solution

Method 1 – Window Functions (SQL), Pandas, and PostgreSQL

Intuition

To find the median salary for each company, we need to sort the salaries for each company and select the middle value(s). For an odd number of employees, the median is the middle salary; for an even number, it’s the average of the two middle salaries.

Approach

  1. For each company, order employees by salary.
  2. Use window functions to assign row numbers and count employees per company.
  3. Select the median salary based on the count (odd/even).
  4. For SQL, use window functions; for Pandas, group and compute median; for PostgreSQL, use similar window logic.

Code

1
2
3
4
5
6
7
8
9
SELECT company, AVG(salary) AS median_salary
FROM (
  SELECT company, salary,
    ROW_NUMBER() OVER (PARTITION BY company ORDER BY salary) AS rn,
    COUNT(*) OVER (PARTITION BY company) AS cnt
  FROM Employee
) t
WHERE rn IN (FLOOR((cnt + 1) / 2), CEIL((cnt + 1) / 2))
GROUP BY company;
1
2
3
4
5
6
7
8
9
SELECT company, AVG(salary) AS median_salary
FROM (
  SELECT company, salary,
    ROW_NUMBER() OVER (PARTITION BY company ORDER BY salary) AS rn,
    COUNT(*) OVER (PARTITION BY company) AS cnt
  FROM Employee
) t
WHERE rn IN ((cnt + 1) / 2, (cnt + 2) / 2)
GROUP BY company;
1
2
def median_employee_salary(df: 'pd.DataFrame') -> 'pd.DataFrame':
    return df.groupby('company')['salary'].median().reset_index(name='median_salary')

Complexity

  • ⏰ Time complexity: O(n log n) for sorting salaries per company.
  • 🧺 Space complexity: O(n) for window function or groupby storage.