Problem
Table: Stadium
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| visit_date | date |
| people | int |
+---------------+---------+
visit_date is the column with unique values for this table. Each row of this table contains the visit date and visit id to the stadium with the number of people during the visit. As the id increases, the date increases as well.
Write a solution to display the records with three or more rows with consecutive id
’s, and the number of people is greater than or equal to 100 for each.
Return the result table ordered by visit_date
in ascending order.
The result format is in the following example.
Examples
Example 1:
Input: Stadium table:
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 1 | 2017-01-01 | 10 |
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 4 | 2017-01-04 | 99 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
Output:
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
Explanation: The four rows with ids 5, 6, 7, and 8 have consecutive ids and each of them has >= 100 people attended. Note that row 8 was included even though the visit_date was not the next day after row 7. The rows with ids 2 and 3 are not included because we need at least three consecutive ids.
Solution
Method 1 - Using CTEs
Here is the approach:
- ConsecutiveVisits CTE:
- This CTE uses window functions to add columns for the number of people in the next two visits (
LEAD
) and the previous two visits (LAG
) for each row. This helps in checking the consecutive nature without complex self-joins.
- This CTE uses window functions to add columns for the number of people in the next two visits (
- FilteredVisits CTE:
- This CTE extracts rows where the current row and its next two rows, or its previous row and next row, or its previous two rows all have 100 or more people.
- FinalResult CTE:
- This final step ensures that we get the correct set of rows needed by combining the results of the
FilteredVisits
CTE and additional checks to include any relevant surrounding rows to cover consecutive scope.
- This final step ensures that we get the correct set of rows needed by combining the results of the
- Final Selection and Ordering:
- The output rows are selected from
FinalResult
and ordered byvisit_date
.
- The output rows are selected from
Code
Sql
WITH ConsecutiveVisits AS (
SELECT
id,
visit_date,
people,
LEAD(people, 1) OVER (ORDER BY id) AS next_people_1,
LEAD(people, 2) OVER (ORDER BY id) AS next_people_2,
LAG(people, 1) OVER (ORDER BY id) AS prev_people_1,
LAG(people, 2) OVER (ORDER BY id) AS prev_people_2
FROM
Stadium
),
FilteredVisits AS (
SELECT
id,
visit_date,
people
FROM
ConsecutiveVisits
WHERE
(people >= 100 AND next_people_1 >= 100 AND next_people_2 >= 100)
OR (people >= 100 AND prev_people_1 >= 100 AND next_people_1 >= 100)
OR (people >= 100 AND prev_people_1 >= 100 AND prev_people_2 >= 100)
),
FinalResult AS (
SELECT
id,
visit_date,
people
FROM
FilteredVisits
UNION
SELECT
id,
visit_date,
people
FROM
ConsecutiveVisits
WHERE
id IN (SELECT id FROM FilteredVisits)
OR id IN (SELECT id FROM FilteredVisits ORDER BY id LIMIT 1 OFFSET -1)
OR id IN (SELECT id FROM FilteredVisits ORDER BY id LIMIT 1 OFFSET 1)
)
SELECT
id,
visit_date,
people
FROM
FinalResult
ORDER BY
visit_date;
Complexity
- Time:
O(n log n)
- The primary operations are window functions and joins, which are
O(n)
per window function over the dataset in general. - Aggregating the results per visit using the window functions is
O(n log n)
.
- The primary operations are window functions and joins, which are
- Space:
O(n)
- Space complexity depends mainly on the intermediate data held in CTEs and window functions, which is proportional to the size of the dataset,
O(n)
.
- Space complexity depends mainly on the intermediate data held in CTEs and window functions, which is proportional to the size of the dataset,