Problem

Table: Employee

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| month       | int  |
| salary      | int  |
+-------------+------+
(id, month) is the primary key (combination of columns with unique values) for this table.
Each row in the table indicates the salary of an employee in one month during the year 2020.

Write a solution to calculate the cumulative salary summary for every employee in a single unified table.

The cumulative salary summary for an employee can be calculated as follows:

  • For each month that the employee worked, sum up the salaries in that month and the previous two months. This is their 3-month sum for that month. If an employee did not work for the company in previous months, their effective salary for those months is 0.
  • Do not include the 3-month sum for the most recent month that the employee worked for in the summary.
  • Do not include the 3-month sum for any month the employee did not work.

Return the result table ordered by id in ascending order. In case of a tie, order it by month in descending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Input: 
Employee table:
+----+-------+--------+
| id | month | salary |
+----+-------+--------+
| 1  | 1     | 20     |
| 2  | 1     | 20     |
| 1  | 2     | 30     |
| 2  | 2     | 30     |
| 3  | 2     | 40     |
| 1  | 3     | 40     |
| 3  | 3     | 60     |
| 1  | 4     | 60     |
| 3  | 4     | 70     |
| 1  | 7     | 90     |
| 1  | 8     | 90     |
+----+-------+--------+
Output: 
+----+-------+--------+
| id | month | Salary |
+----+-------+--------+
| 1  | 7     | 90     |
| 1  | 4     | 130    |
| 1  | 3     | 90     |
| 1  | 2     | 50     |
| 1  | 1     | 20     |
| 2  | 1     | 20     |
| 3  | 3     | 100    |
| 3  | 2     | 40     |
+----+-------+--------+
Explanation: 
Employee '1' has five salary records excluding their most recent month '8':
- 90 for month '7'.
- 60 for month '4'.
- 40 for month '3'.
- 30 for month '2'.
- 20 for month '1'.
So the cumulative salary summary for this employee is:
+----+-------+--------+
| id | month | salary |
+----+-------+--------+
| 1  | 7     | 90     |  (90 + 0 + 0)
| 1  | 4     | 130    |  (60 + 40 + 30)
| 1  | 3     | 90     |  (40 + 30 + 20)
| 1  | 2     | 50     |  (30 + 20 + 0)
| 1  | 1     | 20     |  (20 + 0 + 0)
+----+-------+--------+
Note that the 3-month sum for month '7' is 90 because they did not work during month '6' or month '5'.
Employee '2' only has one salary record (month '1') excluding their most recent month '2'.
+----+-------+--------+
| id | month | salary |
+----+-------+--------+
| 2  | 1     | 20     |  (20 + 0 + 0)
+----+-------+--------+
Employee '3' has two salary records excluding their most recent month '4':
- 60 for month '3'.
- 40 for month '2'.
So the cumulative salary summary for this employee is:
+----+-------+--------+
| id | month | salary |
+----+-------+--------+
| 3  | 3     | 100    |  (60 + 40 + 0)
| 3  | 2     | 40     |  (40 + 0 + 0)
+----+-------+--------+

Solution

Method 1 – Window Function (SQL) and Groupby Cumsum (Pandas)

Intuition

To get the cumulative salary for each employee by month, we need to sum the salary for each employee up to and including each month. This is a classic running total problem.

Approach

  • For MySQL/PostgreSQL:
    1. Use the SUM(salary) OVER (PARTITION BY id ORDER BY month) window function to compute the running total for each employee.
    2. Select id, month, salary, and the cumulative sum as cumulative_salary.
    3. Order the result by id and month.
  • For Python (Pandas):
    1. Sort the DataFrame by id and month.
    2. Use groupby('id')['salary'].cumsum() to compute the cumulative salary for each employee.
    3. Add this as a new column and return the DataFrame sorted by id and month.

Code

1
2
3
4
SELECT id, month, salary,
       SUM(salary) OVER (PARTITION BY id ORDER BY month) AS cumulative_salary
FROM Employee
ORDER BY id, month;
1
2
3
4
5
class Solution:
    def find_cumulative_salary(self, df):
        df = df.sort_values(['id', 'month'])
        df['cumulative_salary'] = df.groupby('id')['salary'].cumsum()
        return df[['id', 'month', 'salary', 'cumulative_salary']].reset_index(drop=True)

Complexity

  • ⏰ Time complexity: O(n log n), due to sorting by id and month.
  • 🧺 Space complexity: O(n), for storing the cumulative sums.