Problem

Table: Drivers

1
2
3
4
5
6
7
8
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| driver_id   | int     |
| join_date   | date    |
+-------------+---------+
driver_id is the column with unique values for this table.
Each row of this table contains the driver's ID and the date they joined the Hopper company.

Table: Rides

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| ride_id      | int     |
| user_id      | int     |
| requested_at | date    |
+--------------+---------+
ride_id is the column with unique values for this table.
Each row of this table contains the ID of a ride, the user's ID that requested it, and the day they requested it.
There may be some ride requests in this table that were not accepted.

Table: AcceptedRides

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| ride_id       | int     |
| driver_id     | int     |
| ride_distance | int     |
| ride_duration | int     |
+---------------+---------+
ride_id is the column with unique values for this table.
Each row of this table contains some information about an accepted ride.
It is guaranteed that each accepted ride exists in the Rides table.

Write a solution to compute the average_ride_distance and average_ride_duration of every 3-month window starting from January -March 2020 to October - December 2020. Round average_ride_distance and average_ride_duration to the nearest two decimal places.

The average_ride_distance is calculated by summing up the total ride_distance values from the three months and dividing it by 3. The average_ride_duration is calculated in a similar way.

Return the result table ordered by month in ascending order, where month is the starting month’s number (January is 1, February is 2, etc.).

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Input: 
Drivers table:
+-----------+------------+
| driver_id | join_date  |
+-----------+------------+
| 10        | 2019-12-10 |
| 8         | 2020-1-13  |
| 5         | 2020-2-16  |
| 7         | 2020-3-8   |
| 4         | 2020-5-17  |
| 1         | 2020-10-24 |
| 6         | 2021-1-5   |
+-----------+------------+
Rides table:
+---------+---------+--------------+
| ride_id | user_id | requested_at |
+---------+---------+--------------+
| 6       | 75      | 2019-12-9    |
| 1       | 54      | 2020-2-9     |
| 10      | 63      | 2020-3-4     |
| 19      | 39      | 2020-4-6     |
| 3       | 41      | 2020-6-3     |
| 13      | 52      | 2020-6-22    |
| 7       | 69      | 2020-7-16    |
| 17      | 70      | 2020-8-25    |
| 20      | 81      | 2020-11-2    |
| 5       | 57      | 2020-11-9    |
| 2       | 42      | 2020-12-9    |
| 11      | 68      | 2021-1-11    |
| 15      | 32      | 2021-1-17    |
| 12      | 11      | 2021-1-19    |
| 14      | 18      | 2021-1-27    |
+---------+---------+--------------+
AcceptedRides table:
+---------+-----------+---------------+---------------+
| ride_id | driver_id | ride_distance | ride_duration |
+---------+-----------+---------------+---------------+
| 10      | 10        | 63            | 38            |
| 13      | 10        | 73            | 96            |
| 7       | 8         | 100           | 28            |
| 17      | 7         | 119           | 68            |
| 20      | 1         | 121           | 92            |
| 5       | 7         | 42            | 101           |
| 2       | 4         | 6             | 38            |
| 11      | 8         | 37            | 43            |
| 15      | 8         | 108           | 82            |
| 12      | 8         | 38            | 34            |
| 14      | 1         | 90            | 74            |
+---------+-----------+---------------+---------------+
Output: 
+-------+-----------------------+-----------------------+
| month | average_ride_distance | average_ride_duration |
+-------+-----------------------+-----------------------+
| 1     | 21.00                 | 12.67                 |
| 2     | 21.00                 | 12.67                 |
| 3     | 21.00                 | 12.67                 |
| 4     | 24.33                 | 32.00                 |
| 5     | 57.67                 | 41.33                 |
| 6     | 97.33                 | 64.00                 |
| 7     | 73.00                 | 32.00                 |
| 8     | 39.67                 | 22.67                 |
| 9     | 54.33                 | 64.33                 |
| 10    | 56.33                 | 77.00                 |
+-------+-----------------------+-----------------------+
Explanation: 
By the end of January --> average_ride_distance = (0+0+63)/3=21, average_ride_duration = (0+0+38)/3=12.67
By the end of February --> average_ride_distance = (0+63+0)/3=21, average_ride_duration = (0+38+0)/3=12.67
By the end of March --> average_ride_distance = (63+0+0)/3=21, average_ride_duration = (38+0+0)/3=12.67
By the end of April --> average_ride_distance = (0+0+73)/3=24.33, average_ride_duration = (0+0+96)/3=32.00
By the end of May --> average_ride_distance = (0+73+100)/3=57.67, average_ride_duration = (0+96+28)/3=41.33
By the end of June --> average_ride_distance = (73+100+119)/3=97.33, average_ride_duration = (96+28+68)/3=64.00
By the end of July --> average_ride_distance = (100+119+0)/3=73.00, average_ride_duration = (28+68+0)/3=32.00
By the end of August --> average_ride_distance = (119+0+0)/3=39.67, average_ride_duration = (68+0+0)/3=22.67
By the end of Septemeber --> average_ride_distance = (0+0+163)/3=54.33, average_ride_duration = (0+0+193)/3=64.33
By the end of October --> average_ride_distance = (0+163+6)/3=56.33, average_ride_duration = (0+193+38)/3=77.00

Solution

Method 1 – Sliding 3-Month Window Aggregation

Intuition

For each 3-month window (Jan–Mar, Feb–Apr, …, Oct–Dec 2020), sum the ride distances and durations for accepted rides requested in those months, then divide by 3. This smooths out monthly fluctuations and gives a rolling average.

Approach

  1. For each window starting at month m (1 to 10):
    • Consider months m, m+1, m+2 of 2020.
    • For each month in the window, sum ride_distance and ride_duration for accepted rides whose requested_at is in that month.
    • Add up the three months’ totals, then divide by 3 for the average.
    • Round to two decimal places.
  2. Return the result ordered by month.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
WITH months AS (
  SELECT 1 AS m UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
  SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
  SELECT 9 UNION ALL SELECT 10
),
ride_stats AS (
  SELECT MONTH(r.requested_at) AS m, 
         SUM(ar.ride_distance) AS dist, 
         SUM(ar.ride_duration) AS dur
  FROM AcceptedRides ar
  JOIN Rides r ON ar.ride_id = r.ride_id
  WHERE YEAR(r.requested_at) = 2020
  GROUP BY MONTH(r.requested_at)
)
SELECT
  m.m AS month,
  ROUND((IFNULL(s1.dist,0) + IFNULL(s2.dist,0) + IFNULL(s3.dist,0))/3, 2) AS average_ride_distance,
  ROUND((IFNULL(s1.dur,0) + IFNULL(s2.dur,0) + IFNULL(s3.dur,0))/3, 2) AS average_ride_duration
FROM months m
LEFT JOIN ride_stats s1 ON s1.m = m.m
LEFT JOIN ride_stats s2 ON s2.m = m.m+1
LEFT JOIN ride_stats s3 ON s3.m = m.m+2
ORDER BY m.m;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
WITH months AS (
  SELECT generate_series(1,10) AS m
),
ride_stats AS (
  SELECT EXTRACT(MONTH FROM r.requested_at)::int AS m, 
         SUM(ar.ride_distance) AS dist, 
         SUM(ar.ride_duration) AS dur
  FROM AcceptedRides ar
  JOIN Rides r ON ar.ride_id = r.ride_id
  WHERE EXTRACT(YEAR FROM r.requested_at) = 2020
  GROUP BY EXTRACT(MONTH FROM r.requested_at)
)
SELECT
  m.m AS month,
  ROUND((COALESCE(s1.dist,0) + COALESCE(s2.dist,0) + COALESCE(s3.dist,0))/3.0, 2) AS average_ride_distance,
  ROUND((COALESCE(s1.dur,0) + COALESCE(s2.dur,0) + COALESCE(s3.dur,0))/3.0, 2) AS average_ride_duration
FROM months m
LEFT JOIN ride_stats s1 ON s1.m = m.m
LEFT JOIN ride_stats s2 ON s2.m = m.m+1
LEFT JOIN ride_stats s3 ON s3.m = m.m+2
ORDER BY m.m;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def hopper_company_queries_3(rides, accepted):
    import pandas as pd
    rides = rides.copy()
    rides['month'] = rides['requested_at'].dt.month
    rides['year'] = rides['requested_at'].dt.year
    merged = pd.merge(accepted, rides, on='ride_id')
    stats = merged[merged['year'] == 2020].groupby('month').agg({'ride_distance':'sum', 'ride_duration':'sum'}).reindex(range(1,13), fill_value=0)
    ans = []
    for m in range(1, 11):
        dist = stats.loc[m:m+2, 'ride_distance'].sum()/3
        dur = stats.loc[m:m+2, 'ride_duration'].sum()/3
        ans.append({'month': m, 'average_ride_distance': round(dist,2), 'average_ride_duration': round(dur,2)})
    return pd.DataFrame(ans)

Complexity

  • ⏰ Time complexity: O(N), where N is the number of accepted rides, as we aggregate by month and then slide the window.
  • 🧺 Space complexity: O(1) (excluding output), as we only store monthly aggregates and window results.