Problem

Table: Tasks

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| task_id        | int     |
| subtasks_count | int     |
+----------------+---------+
task_id is the column with unique values for this table.
Each row in this table indicates that task_id was divided into subtasks_count subtasks labeled from 1 to subtasks_count.
It is guaranteed that 2 <= subtasks_count <= 20.

Table: Executed

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| task_id       | int     |
| subtask_id    | int     |
+---------------+---------+
(task_id, subtask_id) is the combination of columns with unique values for this table.
Each row in this table indicates that for the task task_id, the subtask with ID subtask_id was executed successfully.
It is **guaranteed** that subtask_id <= subtasks_count for each task_id.

Write a solution to report the IDs of the missing subtasks for each task_id.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Input: 
Tasks table:
+---------+----------------+
| task_id | subtasks_count |
+---------+----------------+
| 1       | 3              |
| 2       | 2              |
| 3       | 4              |
+---------+----------------+
Executed table:
+---------+------------+
| task_id | subtask_id |
+---------+------------+
| 1       | 2          |
| 3       | 1          |
| 3       | 2          |
| 3       | 3          |
| 3       | 4          |
+---------+------------+
Output: 
+---------+------------+
| task_id | subtask_id |
+---------+------------+
| 1       | 1          |
| 1       | 3          |
| 2       | 1          |
| 2       | 2          |
+---------+------------+
Explanation: 
Task 1 was divided into 3 subtasks (1, 2, 3). Only subtask 2 was executed successfully, so we include (1, 1) and (1, 3) in the answer.
Task 2 was divided into 2 subtasks (1, 2). No subtask was executed successfully, so we include (2, 1) and (2, 2) in the answer.
Task 3 was divided into 4 subtasks (1, 2, 3, 4). All of the subtasks were executed successfully.

Solution

Method 1 – SQL: Generate All Subtasks and Find Missing

Intuition

For each task, generate all possible (task_id, subtask_id) pairs. Then, find which of these pairs are not present in the Executed table.

Approach

  1. For each task in Tasks, generate all subtask_id from 1 to subtasks_count.
  2. Left join these pairs with Executed on (task_id, subtask_id).
  3. Select those where Executed.subtask_id is null (not executed).
  4. Return the result ordered by task_id and subtask_id.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT t.task_id, s.subtask_id
FROM Tasks t
JOIN (
    SELECT 1 AS subtask_id UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10
    UNION SELECT 11 UNION SELECT 12 UNION SELECT 13 UNION SELECT 14 UNION SELECT 15 UNION SELECT 16 UNION SELECT 17 UNION SELECT 18 UNION SELECT 19 UNION SELECT 20
) s
  ON s.subtask_id <= t.subtasks_count
LEFT JOIN Executed e
  ON t.task_id = e.task_id AND s.subtask_id = e.subtask_id
WHERE e.subtask_id IS NULL
ORDER BY t.task_id, s.subtask_id;
1
2
3
4
5
6
7
8
SELECT t.task_id, s.subtask_id
FROM Tasks t
JOIN generate_series(1, 20) AS s(subtask_id)
  ON s.subtask_id <= t.subtasks_count
LEFT JOIN Executed e
  ON t.task_id = e.task_id AND s.subtask_id = e.subtask_id
WHERE e.subtask_id IS NULL
ORDER BY t.task_id, s.subtask_id;
1
2
3
4
5
6
7
8
import pandas as pd

def find_unexecuted_subtasks(tasks: pd.DataFrame, executed: pd.DataFrame) -> pd.DataFrame:
    all_subtasks = tasks.loc[tasks.index.repeat(tasks['subtasks_count'])].copy()
    all_subtasks['subtask_id'] = all_subtasks.groupby('task_id').cumcount() + 1
    merged = all_subtasks.merge(executed, on=['task_id', 'subtask_id'], how='left', indicator=True)
    res = merged[merged['_merge'] == 'left_only'][['task_id', 'subtask_id']]
    return res.sort_values(['task_id', 'subtask_id']).reset_index(drop=True)

Complexity

  • ⏰ Time complexity: O(n * m) — n = number of tasks, m = max subtasks per task (≤ 20).
  • 🧺 Space complexity: O(n * m) — For storing all possible subtasks.