Problem

Table: Tasks

+---------------+----------+
| Column Name   | Type     |
+---------------+----------+
| task_id       | int      |
| employee_id   | int      |
| start_time    | datetime |
| end_time      | datetime |
+---------------+----------+
(task_id, employee_id) is the primary key for this table.
Each row in this table contains the task identifier, the employee identifier, and the start and end times of each task.

Write a solution to find the total duration of tasks for each employee and the maximum number of concurrent tasks an employee handled at any point in time. The total duration should be rounded down to the nearest number of full hours.

Return the result table ordered by employee_idascending order.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Input:
Tasks table:
+---------+-------------+---------------------+---------------------+
| task_id | employee_id | start_time          | end_time            |
+---------+-------------+---------------------+---------------------+
| 1       | 1001        | 2023-05-01 08:00:00 | 2023-05-01 09:00:00 |
| 2       | 1001        | 2023-05-01 08:30:00 | 2023-05-01 10:30:00 |
| 3       | 1001        | 2023-05-01 11:00:00 | 2023-05-01 12:00:00 |
| 7       | 1001        | 2023-05-01 13:00:00 | 2023-05-01 15:30:00 |
| 4       | 1002        | 2023-05-01 09:00:00 | 2023-05-01 10:00:00 |
| 5       | 1002        | 2023-05-01 09:30:00 | 2023-05-01 11:30:00 |
| 6       | 1003        | 2023-05-01 14:00:00 | 2023-05-01 16:00:00 |
+---------+-------------+---------------------+---------------------+
Output:
+-------------+------------------+----------------------+
| employee_id | total_task_hours | max_concurrent_tasks |
+-------------+------------------+----------------------+
| 1001        | 6                | 2                    |
| 1002        | 2                | 2                    |
| 1003        | 2                | 1                    |
+-------------+------------------+----------------------+
Explanation:
* For employee ID 1001: 
* Task 1 and Task 2 overlap from 08:30 to 09:00 (30 minutes).
* Task 7 has a duration of 150 minutes (2 hours and 30 minutes).
* Total task time: 60 (Task 1) + 120 (Task 2) + 60 (Task 3) + 150 (Task 7) - 30 (overlap) = 360 minutes = 6 hours.
* Maximum concurrent tasks: 2 (during the overlap period).
* For employee ID 1002: 
* Task 4 and Task 5 overlap from 09:30 to 10:00 (30 minutes).
* Total task time: 60 (Task 4) + 120 (Task 5) - 30 (overlap) = 150 minutes = 2 hours and 30 minutes.
* Total task hours (rounded down): 2 hours.
* Maximum concurrent tasks: 2 (during the overlap period).
* For employee ID 1003: 
* No overlapping tasks.
* Total task time: 120 minutes = 2 hours.
* Maximum concurrent tasks: 1.
**Note:** Output table is ordered by employee_id in ascending order.

Solution

Method 1 – SQL Window Functions and Event Sweeping (MySQL, PostgreSQL, Pandas)

Intuition

To find the total duration, sum the difference between end_time and start_time for each employee. For the maximum number of concurrent tasks, treat each start and end as an event and sweep through them, counting how many tasks overlap at any time.

Approach

  1. For total duration:
    • For each employee, sum the duration of all their tasks in hours (using TIMESTAMPDIFF or equivalent, and round down).
  2. For concurrent tasks:
    • For each employee, treat each start_time as +1 and each end_time as -1 event.
    • Sort all events by time, and sweep through, maintaining a running count of active tasks, tracking the maximum.
  3. Return both metrics per employee, ordered by employee_id.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
WITH events AS (
  SELECT employee_id, start_time AS t, 1 AS delta FROM Tasks
  UNION ALL
  SELECT employee_id, end_time AS t, -1 AS delta FROM Tasks
),
concurrent AS (
  SELECT employee_id, t, SUM(delta) OVER (PARTITION BY employee_id ORDER BY t, delta DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cnt
  FROM events
)
SELECT t1.employee_id,
       FLOOR(SUM(TIMESTAMPDIFF(SECOND, t1.start_time, t1.end_time))/3600) AS total_duration,
       MAX(t2.cnt) AS max_concurrent_tasks
FROM Tasks t1
JOIN concurrent t2 ON t1.employee_id = t2.employee_id
GROUP BY t1.employee_id
ORDER BY t1.employee_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
WITH events AS (
  SELECT employee_id, start_time AS t, 1 AS delta FROM Tasks
  UNION ALL
  SELECT employee_id, end_time AS t, -1 AS delta FROM Tasks
),
concurrent AS (
  SELECT employee_id, t, SUM(delta) OVER (PARTITION BY employee_id ORDER BY t, delta DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cnt
  FROM events
)
SELECT t1.employee_id,
       FLOOR(SUM(EXTRACT(EPOCH FROM (t1.end_time - t1.start_time))/3600)) AS total_duration,
       MAX(t2.cnt) AS max_concurrent_tasks
FROM Tasks t1
JOIN concurrent t2 ON t1.employee_id = t2.employee_id
GROUP BY t1.employee_id
ORDER BY t1.employee_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import pandas as pd

def employee_task_stats(tasks: pd.DataFrame) -> pd.DataFrame:
    tasks['duration'] = (tasks['end_time'] - tasks['start_time']).dt.total_seconds() // 3600
    total = tasks.groupby('employee_id')['duration'].sum().astype(int)
    res = []
    for eid, group in tasks.groupby('employee_id'):
        events = []
        for _, row in group.iterrows():
            events.append((row['start_time'], 1))
            events.append((row['end_time'], -1))
        events.sort()
        cnt = mx = 0
        for _, d in events:
            cnt += d
            mx = max(mx, cnt)
        res.append({'employee_id': eid, 'total_duration': int(total.loc[eid]), 'max_concurrent_tasks': mx})
    return pd.DataFrame(res).sort_values('employee_id')

Complexity

  • ⏰ Time complexity: O(n log n) per employee for the sweep, where n is the number of tasks per employee.
  • 🧺 Space complexity: O(n) for storing events and results.