Problem

Table: Failed

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| fail_date    | date    |
+--------------+---------+
fail_date is the primary key (column with unique values) for this table.
This table contains the days of failed tasks.

Table: Succeeded

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| success_date | date    |
+--------------+---------+
success_date is the primary key (column with unique values) for this table.
This table contains the days of succeeded tasks.

A system is running one task every day. Every task is independent of the previous tasks. The tasks can fail or succeed.

Write a solution to report the period_state for each continuous interval of days in the period from 2019-01-01 to 2019-12-31.

period_state is failed'__ if tasks in this interval failed or 'succeeded' if tasks in this interval succeeded. Interval of days are retrieved as start_date and end_date.

Return the result table ordered by start_date.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Input: 
Failed table:
+-------------------+
| fail_date         |
+-------------------+
| 2018-12-28        |
| 2018-12-29        |
| 2019-01-04        |
| 2019-01-05        |
+-------------------+
Succeeded table:
+-------------------+
| success_date      |
+-------------------+
| 2018-12-30        |
| 2018-12-31        |
| 2019-01-01        |
| 2019-01-02        |
| 2019-01-03        |
| 2019-01-06        |
+-------------------+
Output: 
+--------------+--------------+--------------+
| period_state | start_date   | end_date     |
+--------------+--------------+--------------+
| succeeded    | 2019-01-01   | 2019-01-03   |
| failed       | 2019-01-04   | 2019-01-05   |
| succeeded    | 2019-01-06   | 2019-01-06   |
+--------------+--------------+--------------+
Explanation: 
The report ignored the system state in 2018 as we care about the system in the period 2019-01-01 to 2019-12-31.
From 2019-01-01 to 2019-01-03 all tasks succeeded and the system state was "succeeded".
From 2019-01-04 to 2019-01-05 all tasks failed and the system state was "failed".
From 2019-01-06 to 2019-01-06 all tasks succeeded and the system state was "succeeded".

Solution

Method 1 - Window Functions and Grouping (SQL & Pandas)

Intuition

We need to report contiguous intervals of days where the system was in the same state (failed or succeeded) for each day in 2019. This is a classic “islands and gaps” problem, which can be solved by assigning a group number to each contiguous block of the same state.

Approach

  1. Generate all dates from 2019-01-01 to 2019-12-31.
  2. For each date, determine if it is in the Failed or Succeeded table (it must be in one or the other).
  3. Assign a state (‘failed’ or ‘succeeded’) to each date.
  4. Use a window function or difference of row_number and date to group contiguous blocks of the same state.
  5. For each group, report the state, start_date, and end_date.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
WITH RECURSIVE dates AS (
  SELECT DATE('2019-01-01') AS d
  UNION ALL
  SELECT d + INTERVAL 1 DAY FROM dates WHERE d < '2019-12-31'
),
status AS (
  SELECT d AS day,
         CASE WHEN f.fail_date IS NOT NULL THEN 'failed' ELSE 'succeeded' END AS period_state
  FROM dates d
  LEFT JOIN Failed f ON d.d = f.fail_date
)
SELECT period_state, MIN(day) AS start_date, MAX(day) AS end_date
FROM (
  SELECT *,
         ROW_NUMBER() OVER (ORDER BY day) - ROW_NUMBER() OVER (PARTITION BY period_state ORDER BY day) AS grp
  FROM status
  ) t
GROUP BY period_state, grp
ORDER BY start_date;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
WITH RECURSIVE dates AS (
  SELECT DATE '2019-01-01' AS d
  UNION ALL
  SELECT d + INTERVAL '1 day' FROM dates WHERE d < DATE '2019-12-31'
),
status AS (
  SELECT d AS day,
         CASE WHEN f.fail_date IS NOT NULL THEN 'failed' ELSE 'succeeded' END AS period_state
  FROM dates d
  LEFT JOIN Failed f ON d.d = f.fail_date
)
SELECT period_state, MIN(day) AS start_date, MAX(day) AS end_date
FROM (
  SELECT *,
         ROW_NUMBER() OVER (ORDER BY day) - ROW_NUMBER() OVER (PARTITION BY period_state ORDER BY day) AS grp
  FROM status
  ) t
GROUP BY period_state, grp
ORDER BY start_date;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import pandas as pd
from datetime import datetime
# failed and succeeded are DataFrames with 'fail_date' and 'success_date' columns
all_dates = pd.date_range('2019-01-01', '2019-12-31')
df = pd.DataFrame({'date': all_dates})
df['period_state'] = 'succeeded'
df.loc[df['date'].isin(failed['fail_date']), 'period_state'] = 'failed'
# Find contiguous blocks
df['grp'] = (df['period_state'] != df['period_state'].shift()).cumsum()
result = df.groupby(['period_state', 'grp']).agg(start_date=('date', 'first'), end_date=('date', 'last')).reset_index(drop=True)
result = result[['period_state', 'start_date', 'end_date']]

Complexity

  • ⏰ Time complexity: O(D), where D is the number of days in 2019 (365).
  • 🧺 Space complexity: O(D), for storing the date and state for each day.