Problem

Table: Salary

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| emp_id        | int     |
| firstname     | varchar |
| lastname      | varchar |
| salary        | varchar |
| department_id | varchar |
+---------------+---------+
(emp_id, salary) is the primary key (combination of columns with unique values) for this table.
Each row contains employees details and their yearly salaries, however, some of the records are old and contain outdated salary information.

Write a solution to find the current salary of each employee assuming that salaries increase each year. Output their emp_id, firstname, lastname, salary, and department_id.

Return the result table ordered by emp_id in ascending order .

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Input:Salary table:
+--------+-----------+----------+--------+---------------+
| emp_id | firstname | lastname | salary | department_id |
+--------+-----------+----------+--------+---------------+ 
| 1      | Todd      | Wilson   | 110000 | D1006         |
| 1      | Todd      | Wilson   | 106119 | D1006         | 
| 2      | Justin    | Simon    | 128922 | D1005         | 
| 2      | Justin    | Simon    | 130000 | D1005         | 
| 3      | Kelly     | Rosario  | 42689  | D1002         | 
| 4      | Patricia  | Powell   | 162825 | D1004         |
| 4      | Patricia  | Powell   | 170000 | D1004         |
| 5      | Sherry    | Golden   | 44101  | D1002         | 
| 6      | Natasha   | Swanson  | 79632  | D1005         | 
| 6      | Natasha   | Swanson  | 90000  | D1005         |
+--------+-----------+----------+--------+---------------+
Output: +--------+-----------+----------+--------+---------------+
| emp_id | firstname | lastname | salary | department_id |
+--------+-----------+----------+--------+---------------+ 
| 1      | Todd      | Wilson   | 110000 | D1006         |
| 2      | Justin    | Simon    | 130000 | D1005         | 
| 3      | Kelly     | Rosario  | 42689  | D1002         | 
| 4      | Patricia  | Powell   | 170000 | D1004         |
| 5      | Sherry    | Golden   | 44101  | D1002         | 
| 6      | Natasha   | Swanson  | 90000  | D1005         |
+--------+-----------+----------+--------+---------------+****
Explanation:
- emp_id 1 has two records with a salary of 110000, 106119 out of these 110000 is an updated salary (Assuming salary is increasing each year)
- emp_id 2 has two records with a salary of 128922, 130000 out of these 130000 is an updated salary.
- emp_id 3 has only one salary record so that is already an updated salary.
- emp_id 4 has two records with a salary of 162825, 170000 out of these 170000 is an updated salary.
- emp_id 5 has only one salary record so that is already an updated salary.
- emp_id 6 has two records with a salary of 79632, 90000 out of these 90000 is an updated salary.

Solution

Method 1 – Window Function or Group By

Intuition

We want the latest (highest) salary for each employee. Since salaries increase each year, the maximum salary for each emp_id is the latest one.

Approach

  1. For each emp_id, select the row with the maximum salary (cast salary to integer if needed).
  2. Return emp_id, firstname, lastname, salary, and department_id for each employee’s latest salary.

Code

1
2
3
4
5
6
7
SELECT emp_id, firstname, lastname, salary, department_id
FROM Salary s1
WHERE CAST(salary AS UNSIGNED) = (
  SELECT MAX(CAST(salary AS UNSIGNED))
  FROM Salary s2
  WHERE s2.emp_id = s1.emp_id
);
1
2
3
4
5
6
7
SELECT emp_id, firstname, lastname, salary, department_id
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY emp_id ORDER BY salary::int DESC) AS rn
  FROM Salary
) t
WHERE rn = 1;
1
2
3
4
5
6
import pandas as pd

def find_latest_salaries(salary: pd.DataFrame) -> pd.DataFrame:
    salary['salary'] = salary['salary'].astype(int)
    idx = salary.groupby('emp_id')['salary'].idxmax()
    return salary.loc[idx][['emp_id', 'firstname', 'lastname', 'salary', 'department_id']]

Complexity

  • ⏰ Time complexity: O(n), where n is the number of rows in Salary. Each grouping and max operation is linear.
  • 🧺 Space complexity: O(n), for storing intermediate groupings and results.