+--------------+---------+
| Column Name | Type |
+--------------+---------+
| fail_date | date |
+--------------+---------+
fail_date is the primary key (column with unique values) for this table.
This table contains the days of failed tasks.
Table: Succeeded
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| success_date | date |
+--------------+---------+
success_date is the primary key (column with unique values) for this table.
This table contains the days of succeeded tasks.
A system is running one task every day. Every task is independent of the previous tasks. The tasks can fail or succeed.
Write a solution to report the period_state for each continuous interval of days in the period from 2019-01-01 to 2019-12-31.
period_state is ’failed'__ if tasks in this interval failed or
'succeeded' if tasks in this interval succeeded. Interval of days are retrieved as start_date and end_date.
Input:
Failed table:+-------------------+| fail_date |+-------------------+|2018-12-28||2018-12-29||2019-01-04||2019-01-05|+-------------------+Succeeded table:+-------------------+| success_date |+-------------------+|2018-12-30||2018-12-31||2019-01-01||2019-01-02||2019-01-03||2019-01-06|+-------------------+Output:
+--------------+--------------+--------------+| period_state | start_date | end_date |+--------------+--------------+--------------+| succeeded |2019-01-01|2019-01-03|| failed |2019-01-04|2019-01-05|| succeeded |2019-01-06|2019-01-06|+--------------+--------------+--------------+Explanation:
The report ignored the system state in2018 as we care about the system in the period 2019-01-01 to 2019-12-31.From 2019-01-01 to 2019-01-03 all tasks succeeded and the system state was "succeeded".From 2019-01-04 to 2019-01-05 all tasks failed and the system state was "failed".From 2019-01-06 to 2019-01-06 all tasks succeeded and the system state was "succeeded".
We need to report contiguous intervals of days where the system was in the same state (failed or succeeded) for each day in 2019. This is a classic “islands and gaps” problem, which can be solved by assigning a group number to each contiguous block of the same state.
WITHRECURSIVE dates AS (
SELECT DATE('2019-01-01') AS d
UNIONALLSELECT d + INTERVAL 1DAYFROM dates WHERE d <'2019-12-31'),
status AS (
SELECT d ASday,
CASEWHEN f.fail_date ISNOTNULLTHEN'failed'ELSE'succeeded'ENDAS period_state
FROM dates d
LEFTJOIN Failed f ON d.d = f.fail_date
)
SELECT period_state, MIN(day) AS start_date, MAX(day) AS end_date
FROM (
SELECT*,
ROW_NUMBER() OVER (ORDERBYday) - ROW_NUMBER() OVER (PARTITION BY period_state ORDERBYday) AS grp
FROM status
) t
GROUPBY period_state, grp
ORDERBY start_date;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
WITHRECURSIVE dates AS (
SELECT DATE '2019-01-01'AS d
UNIONALLSELECT d + INTERVAL '1 day'FROM dates WHERE d < DATE '2019-12-31'),
status AS (
SELECT d ASday,
CASEWHEN f.fail_date ISNOTNULLTHEN'failed'ELSE'succeeded'ENDAS period_state
FROM dates d
LEFTJOIN Failed f ON d.d = f.fail_date
)
SELECT period_state, MIN(day) AS start_date, MAX(day) AS end_date
FROM (
SELECT*,
ROW_NUMBER() OVER (ORDERBYday) - ROW_NUMBER() OVER (PARTITION BY period_state ORDERBYday) AS grp
FROM status
) t
GROUPBY period_state, grp
ORDERBY start_date;
1
2
3
4
5
6
7
8
9
10
11
import pandas as pd
from datetime import datetime
# failed and succeeded are DataFrames with 'fail_date' and 'success_date' columnsall_dates = pd.date_range('2019-01-01', '2019-12-31')
df = pd.DataFrame({'date': all_dates})
df['period_state'] ='succeeded'df.loc[df['date'].isin(failed['fail_date']), 'period_state'] ='failed'# Find contiguous blocksdf['grp'] = (df['period_state'] != df['period_state'].shift()).cumsum()
result = df.groupby(['period_state', 'grp']).agg(start_date=('date', 'first'), end_date=('date', 'last')).reset_index(drop=True)
result = result[['period_state', 'start_date', 'end_date']]