Problem

Table: Drivers

1
2
3
4
5
6
7
8
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| driver_id   | int     |
| join_date   | date    |
+-------------+---------+
driver_id is the primary key (column with unique values) for this table.
Each row of this table contains the driver's ID and the date they joined the Hopper company.

Table: Rides

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| ride_id      | int     |
| user_id      | int     |
| requested_at | date    |
+--------------+---------+
ride_id is the primary key (column with unique values) for this table.
Each row of this table contains the ID of a ride, the user's ID that requested it, and the day they requested it.
There may be some ride requests in this table that were not accepted.

Table: AcceptedRides

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| ride_id       | int     |
| driver_id     | int     |
| ride_distance | int     |
| ride_duration | int     |
+---------------+---------+
ride_id is the primary key (column with unique values) for this table.
Each row of this table contains some information about an accepted ride.
It is guaranteed that each accepted ride exists in the Rides table.

Write a solution to report the following statistics for each month of 2020 :

  • The number of drivers currently with the Hopper company by the end of the month (active_drivers).
  • The number of accepted rides in that month (accepted_rides).

Return the result table ordered by month in ascending order, where month is the month’s number (January is 1, February is 2, etc.).

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Input: 
Drivers table:
+-----------+------------+
| driver_id | join_date  |
+-----------+------------+
| 10        | 2019-12-10 |
| 8         | 2020-1-13  |
| 5         | 2020-2-16  |
| 7         | 2020-3-8   |
| 4         | 2020-5-17  |
| 1         | 2020-10-24 |
| 6         | 2021-1-5   |
+-----------+------------+
Rides table:
+---------+---------+--------------+
| ride_id | user_id | requested_at |
+---------+---------+--------------+
| 6       | 75      | 2019-12-9    |
| 1       | 54      | 2020-2-9     |
| 10      | 63      | 2020-3-4     |
| 19      | 39      | 2020-4-6     |
| 3       | 41      | 2020-6-3     |
| 13      | 52      | 2020-6-22    |
| 7       | 69      | 2020-7-16    |
| 17      | 70      | 2020-8-25    |
| 20      | 81      | 2020-11-2    |
| 5       | 57      | 2020-11-9    |
| 2       | 42      | 2020-12-9    |
| 11      | 68      | 2021-1-11    |
| 15      | 32      | 2021-1-17    |
| 12      | 11      | 2021-1-19    |
| 14      | 18      | 2021-1-27    |
+---------+---------+--------------+
AcceptedRides table:
+---------+-----------+---------------+---------------+
| ride_id | driver_id | ride_distance | ride_duration |
+---------+-----------+---------------+---------------+
| 10      | 10        | 63            | 38            |
| 13      | 10        | 73            | 96            |
| 7       | 8         | 100           | 28            |
| 17      | 7         | 119           | 68            |
| 20      | 1         | 121           | 92            |
| 5       | 7         | 42            | 101           |
| 2       | 4         | 6             | 38            |
| 11      | 8         | 37            | 43            |
| 15      | 8         | 108           | 82            |
| 12      | 8         | 38            | 34            |
| 14      | 1         | 90            | 74            |
+---------+-----------+---------------+---------------+
Output: 
+-------+----------------+----------------+
| month | active_drivers | accepted_rides |
+-------+----------------+----------------+
| 1     | 2              | 0              |
| 2     | 3              | 0              |
| 3     | 4              | 1              |
| 4     | 4              | 0              |
| 5     | 5              | 0              |
| 6     | 5              | 1              |
| 7     | 5              | 1              |
| 8     | 5              | 1              |
| 9     | 5              | 0              |
| 10    | 6              | 0              |
| 11    | 6              | 2              |
| 12    | 6              | 1              |
+-------+----------------+----------------+
Explanation: 
By the end of January --> two active drivers (10, 8) and no accepted rides.
By the end of February --> three active drivers (10, 8, 5) and no accepted rides.
By the end of March --> four active drivers (10, 8, 5, 7) and one accepted ride (10).
By the end of April --> four active drivers (10, 8, 5, 7) and no accepted rides.
By the end of May --> five active drivers (10, 8, 5, 7, 4) and no accepted rides.
By the end of June --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (13).
By the end of July --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (7).
By the end of August --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (17).
By the end of September --> five active drivers (10, 8, 5, 7, 4) and no accepted rides.
By the end of October --> six active drivers (10, 8, 5, 7, 4, 1) and no accepted rides.
By the end of November --> six active drivers (10, 8, 5, 7, 4, 1) and two accepted rides (20, 5).
By the end of December --> six active drivers (10, 8, 5, 7, 4, 1) and one accepted ride (2).

Solution

Method 1 – Monthly Aggregation with Date Functions

Intuition

We need to count, for each month in 2020:

  • The number of drivers who joined on or before the end of that month (active_drivers).
  • The number of accepted rides requested in that month (accepted_rides).

We can generate all months, then aggregate using date functions and joins.

Approach

  1. Generate a table of months 1 to 12 for 2020.
  2. For each month:
    • Count drivers whose join_date is on or before the last day of the month.
    • Count accepted rides where the ride’s requested_at is in that month and year.
  3. Use LEFT JOINs to ensure months with zero rides or drivers are included.
  4. Return the result ordered by month.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
WITH months AS (
  SELECT 1 AS m UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
  SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
  SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12
)
SELECT
  m.m AS month,
  (SELECT COUNT(*) FROM Drivers WHERE join_date <= MAKEDATE(2020, 1) + INTERVAL m.m MONTH - INTERVAL 1 DAY) AS active_drivers,
  (
    SELECT COUNT(*)
    FROM AcceptedRides ar
    JOIN Rides r ON ar.ride_id = r.ride_id
    WHERE YEAR(r.requested_at) = 2020 AND MONTH(r.requested_at) = m.m
  ) AS accepted_rides
FROM months m
ORDER BY m.m;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
WITH months AS (
  SELECT generate_series(1,12) AS m
)
SELECT
  m.m AS month,
  (SELECT COUNT(*) FROM Drivers WHERE join_date <= (DATE '2020-01-01' + (INTERVAL '1 month' * m.m)) - INTERVAL '1 day') AS active_drivers,
  (
    SELECT COUNT(*)
    FROM AcceptedRides ar
    JOIN Rides r ON ar.ride_id = r.ride_id
    WHERE EXTRACT(YEAR FROM r.requested_at) = 2020 AND EXTRACT(MONTH FROM r.requested_at) = m.m
  ) AS accepted_rides
FROM months m
ORDER BY m.m;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def hopper_company_queries_1(drivers, rides, accepted):
    import pandas as pd
    months = range(1, 13)
    ans = []
    for m in months:
        last_day = pd.Timestamp(year=2020, month=m, day=1) + pd.offsets.MonthEnd(0)
        active = (drivers['join_date'] <= last_day).sum()
        rides_in_month = rides[(rides['requested_at'].dt.year == 2020) & (rides['requested_at'].dt.month == m)]
        accepted_rides = accepted[accepted['ride_id'].isin(rides_in_month['ride_id'])]
        ans.append({'month': m, 'active_drivers': active, 'accepted_rides': len(accepted_rides)})
    return pd.DataFrame(ans)

Complexity

  • ⏰ Time complexity: O(12N + 12M), where N is the number of drivers and M is the number of rides, since we scan all drivers and rides for each month.
  • 🧺 Space complexity: O(1) (excluding output), as we only store counts per month.