Problem

Table: Stadium

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| visit_date    | date    |
| people        | int     |
+---------------+---------+

visit_date is the column with unique values for this table. Each row of this table contains the visit date and visit id to the stadium with the number of people during the visit. As the id increases, the date increases as well.

Write a solution to display the records with three or more rows with consecutive id’s, and the number of people is greater than or equal to 100 for each.

Return the result table ordered by visit_date in ascending order.

The result format is in the following example.

Examples

Example 1:

Input: Stadium table:

+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 1    | 2017-01-01 | 10        |
| 2    | 2017-01-02 | 109       |
| 3    | 2017-01-03 | 150       |
| 4    | 2017-01-04 | 99        |
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+

Output:

+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+

Explanation: The four rows with ids 5, 6, 7, and 8 have consecutive ids and each of them has >= 100 people attended. Note that row 8 was included even though the visit_date was not the next day after row 7. The rows with ids 2 and 3 are not included because we need at least three consecutive ids.

Solution

Method 1 - Using CTEs

Here is the approach:

  1. ConsecutiveVisits CTE:
    • This CTE uses window functions to add columns for the number of people in the next two visits (LEAD) and the previous two visits (LAG) for each row. This helps in checking the consecutive nature without complex self-joins.
  2. FilteredVisits CTE:
    • This CTE extracts rows where the current row and its next two rows, or its previous row and next row, or its previous two rows all have 100 or more people.
  3. FinalResult CTE:
    • This final step ensures that we get the correct set of rows needed by combining the results of the FilteredVisits CTE and additional checks to include any relevant surrounding rows to cover consecutive scope.
  4. Final Selection and Ordering:
    • The output rows are selected from FinalResult and ordered by visit_date.

Code

Sql
 WITH ConsecutiveVisits AS (
     SELECT 
         id,
         visit_date,
         people,
         LEAD(people, 1) OVER (ORDER BY id) AS next_people_1,
         LEAD(people, 2) OVER (ORDER BY id) AS next_people_2,
         LAG(people, 1) OVER (ORDER BY id) AS prev_people_1,
         LAG(people, 2) OVER (ORDER BY id) AS prev_people_2
     FROM
         Stadium
 ),
 FilteredVisits AS (
     SELECT 
         id,
         visit_date,
         people
     FROM 
         ConsecutiveVisits
     WHERE 
       (people >= 100 AND next_people_1 >= 100 AND next_people_2 >= 100)
       OR (people >= 100 AND prev_people_1 >= 100 AND next_people_1 >= 100)
       OR (people >= 100 AND prev_people_1 >= 100 AND prev_people_2 >= 100)
 ),
 FinalResult AS (
     SELECT 
         id, 
         visit_date,
         people
     FROM 
         FilteredVisits
     UNION
     SELECT 
         id,
         visit_date,
         people
     FROM 
         ConsecutiveVisits
     WHERE 
         id IN (SELECT id FROM FilteredVisits)
         OR id IN (SELECT id FROM FilteredVisits ORDER BY id LIMIT 1 OFFSET -1)
         OR id IN (SELECT id FROM FilteredVisits ORDER BY id LIMIT 1 OFFSET 1)
 )
 SELECT 
     id, 
     visit_date, 
     people
 FROM 
     FinalResult
 ORDER BY 
     visit_date;

Complexity

  • Time: O(n log n)
    • The primary operations are window functions and joins, which are O(n) per window function over the dataset in general.
    • Aggregating the results per visit using the window functions is O(n log n).
  • Space: O(n)
    • Space complexity depends mainly on the intermediate data held in CTEs and window functions, which is proportional to the size of the dataset, O(n).