Problem

Table: HallEvents

+-------------+------+
| Column Name | Type |
+-------------+------+
| hall_id     | int  |
| start_day   | date |
| end_day     | date |
+-------------+------+
This table may contain duplicates rows.
Each row of this table indicates the start day and end day of an event and the hall in which the event is held.

Write a solution to merge all the overlapping events that are held in the same hall. Two events overlap if they have at least one day in common.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Input: 
HallEvents table:
+---------+------------+------------+
| hall_id | start_day  | end_day    |
+---------+------------+------------+
| 1       | 2023-01-13 | 2023-01-14 |
| 1       | 2023-01-14 | 2023-01-17 |
| 1       | 2023-01-18 | 2023-01-25 |
| 2       | 2022-12-09 | 2022-12-23 |
| 2       | 2022-12-13 | 2022-12-17 |
| 3       | 2022-12-01 | 2023-01-30 |
+---------+------------+------------+
Output: 
+---------+------------+------------+
| hall_id | start_day  | end_day    |
+---------+------------+------------+
| 1       | 2023-01-13 | 2023-01-17 |
| 1       | 2023-01-18 | 2023-01-25 |
| 2       | 2022-12-09 | 2022-12-23 |
| 3       | 2022-12-01 | 2023-01-30 |
+---------+------------+------------+
Explanation: There are three halls.
Hall 1:
- The two events ["2023-01-13", "2023-01-14"] and ["2023-01-14", "2023-01-17"] overlap. We merge them in one event ["2023-01-13", "2023-01-17"].
- The event ["2023-01-18", "2023-01-25"] does not overlap with any other event, so we leave it as it is.
Hall 2:
- The two events ["2022-12-09", "2022-12-23"] and ["2022-12-13", "2022-12-17"] overlap. We merge them in one event ["2022-12-09", "2022-12-23"].
Hall 3:
- The hall has only one event, so we return it. Note that we only consider the events of each hall separately.

Solution

Method 1 – Window Functions and Grouping

Intuition

To merge overlapping events in the same hall, sort events by hall and start day, then use window functions to group overlapping intervals. For each hall, assign a group to each event where the event overlaps with the previous one, and then aggregate the minimum start and maximum end for each group.

Approach

  1. Sort events by hall_id and start_day.
  2. For each hall, use a window function to assign a group number that increments when an event does not overlap with the previous event.
  3. For each group, aggregate the minimum start_day and maximum end_day.
  4. Return the merged intervals for each hall.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
WITH ordered AS (
  SELECT hall_id, start_day, end_day,
    ROW_NUMBER() OVER (PARTITION BY hall_id ORDER BY start_day) AS rn
  FROM HallEvents
),
groups AS (
  SELECT hall_id, start_day, end_day, rn,
    SUM(CASE WHEN start_day > LAG(end_day) OVER (PARTITION BY hall_id ORDER BY start_day) THEN 1 ELSE 0 END)
      OVER (PARTITION BY hall_id ORDER BY start_day) AS grp
  FROM ordered
)
SELECT hall_id, MIN(start_day) AS start_day, MAX(end_day) AS end_day
FROM groups
GROUP BY hall_id, grp;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
WITH ordered AS (
  SELECT hall_id, start_day, end_day,
    ROW_NUMBER() OVER (PARTITION BY hall_id ORDER BY start_day) AS rn
  FROM HallEvents
),
groups AS (
  SELECT hall_id, start_day, end_day, rn,
    SUM(CASE WHEN start_day > LAG(end_day) OVER (PARTITION BY hall_id ORDER BY start_day) THEN 1 ELSE 0 END)
      OVER (PARTITION BY hall_id ORDER BY start_day) AS grp
  FROM ordered
)
SELECT hall_id, MIN(start_day) AS start_day, MAX(end_day) AS end_day
FROM groups
GROUP BY hall_id, grp;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def merge_overlapping_events(df: 'pd.DataFrame') -> 'pd.DataFrame':
    df = df.sort_values(['hall_id', 'start_day'])
    def merge_group(events):
        merged = []
        for _, row in events.iterrows():
            if not merged or row['start_day'] > merged[-1]['end_day']:
                merged.append(row.to_dict())
            else:
                merged[-1]['end_day'] = max(merged[-1]['end_day'], row['end_day'])
        return pd.DataFrame(merged)
    return df.groupby('hall_id').apply(merge_group).reset_index(drop=True)

Complexity

  • ⏰ Time complexity: O(n log n) for sorting and grouping.
  • 🧺 Space complexity: O(n) for storing merged intervals.