+-------------+---------+
|Column Name |Type|+-------------+---------+
| driver_id | int || join_date | date |+-------------+---------+
driver_id is the columnwithuniquevaluesfor this table.
Eachrowof this tablecontains the driver's ID and the date they joined the Hopper company.
Table: Rides
1
2
3
4
5
6
7
8
9
10
+--------------+---------+
|Column Name |Type|+--------------+---------+
| ride_id | int || user_id | int || requested_at | date |+--------------+---------+
ride_id is the columnwithuniquevaluesfor this table.
Eachrowof this tablecontains the ID of a ride, the user's ID that requested it, and the day they requested it.
There may be some ride requests in this table that were not accepted.
Table: AcceptedRides
1
2
3
4
5
6
7
8
9
10
11
+---------------+---------+
|Column Name |Type|+---------------+---------+
| ride_id | int || driver_id | int || ride_distance | int || ride_duration | int |+---------------+---------+
ride_id is the columnwithuniquevaluesfor this table.
Eachrowof this tablecontainssome information about an accepted ride.
It is guaranteed that each accepted ride existsin the Rides table.
Write a solution to compute the average_ride_distance and average_ride_duration of every 3-month window starting from January -March 2020 to October - December 2020. Round average_ride_distance and average_ride_duration to the nearest two decimal places.
The average_ride_distance is calculated by summing up the total ride_distance values from the three months and dividing it by 3. The average_ride_duration is calculated in a similar way.
Return the result table ordered by month in ascending order, where month is the starting month’s number (January is 1, February is 2, etc.).
Input:
Drivers table:+-----------+------------+| driver_id | join_date |+-----------+------------+|10|2019-12-10||8|2020-1-13||5|2020-2-16||7|2020-3-8||4|2020-5-17||1|2020-10-24||6|2021-1-5|+-----------+------------+Rides table:+---------+---------+--------------+| ride_id | user_id | requested_at |+---------+---------+--------------+|6|75|2019-12-9||1|54|2020-2-9||10|63|2020-3-4||19|39|2020-4-6||3|41|2020-6-3||13|52|2020-6-22||7|69|2020-7-16||17|70|2020-8-25||20|81|2020-11-2||5|57|2020-11-9||2|42|2020-12-9||11|68|2021-1-11||15|32|2021-1-17||12|11|2021-1-19||14|18|2021-1-27|+---------+---------+--------------+AcceptedRides table:+---------+-----------+---------------+---------------+| ride_id | driver_id | ride_distance | ride_duration |+---------+-----------+---------------+---------------+|10|10|63|38||13|10|73|96||7|8|100|28||17|7|119|68||20|1|121|92||5|7|42|101||2|4|6|38||11|8|37|43||15|8|108|82||12|8|38|34||14|1|90|74|+---------+-----------+---------------+---------------+Output:
+-------+-----------------------+-----------------------+| month | average_ride_distance | average_ride_duration |+-------+-----------------------+-----------------------+|1|21.00|12.67||2|21.00|12.67||3|21.00|12.67||4|24.33|32.00||5|57.67|41.33||6|97.33|64.00||7|73.00|32.00||8|39.67|22.67||9|54.33|64.33||10|56.33|77.00|+-------+-----------------------+-----------------------+Explanation:
By the end of January --> average_ride_distance =(0+0+63)/3=21, average_ride_duration =(0+0+38)/3=12.67By the end of February --> average_ride_distance =(0+63+0)/3=21, average_ride_duration =(0+38+0)/3=12.67By the end of March --> average_ride_distance =(63+0+0)/3=21, average_ride_duration =(38+0+0)/3=12.67By the end of April --> average_ride_distance =(0+0+73)/3=24.33, average_ride_duration =(0+0+96)/3=32.00By the end of May --> average_ride_distance =(0+73+100)/3=57.67, average_ride_duration =(0+96+28)/3=41.33By the end of June --> average_ride_distance =(73+100+119)/3=97.33, average_ride_duration =(96+28+68)/3=64.00By the end of July --> average_ride_distance =(100+119+0)/3=73.00, average_ride_duration =(28+68+0)/3=32.00By the end of August --> average_ride_distance =(119+0+0)/3=39.67, average_ride_duration =(68+0+0)/3=22.67By the end of Septemeber --> average_ride_distance =(0+0+163)/3=54.33, average_ride_duration =(0+0+193)/3=64.33By the end of October --> average_ride_distance =(0+163+6)/3=56.33, average_ride_duration =(0+193+38)/3=77.00
For each 3-month window (Jan–Mar, Feb–Apr, …, Oct–Dec 2020), sum the ride distances and durations for accepted rides requested in those months, then divide by 3. This smooths out monthly fluctuations and gives a rolling average.
WITH months AS (
SELECT1AS m UNIONALLSELECT2UNIONALLSELECT3UNIONALLSELECT4UNIONALLSELECT5UNIONALLSELECT6UNIONALLSELECT7UNIONALLSELECT8UNIONALLSELECT9UNIONALLSELECT10),
ride_stats AS (
SELECTMONTH(r.requested_at) AS m,
SUM(ar.ride_distance) AS dist,
SUM(ar.ride_duration) AS dur
FROM AcceptedRides ar
JOIN Rides r ON ar.ride_id = r.ride_id
WHEREYEAR(r.requested_at) =2020GROUPBYMONTH(r.requested_at)
)
SELECT m.m ASmonth,
ROUND((IFNULL(s1.dist,0) + IFNULL(s2.dist,0) + IFNULL(s3.dist,0))/3, 2) AS average_ride_distance,
ROUND((IFNULL(s1.dur,0) + IFNULL(s2.dur,0) + IFNULL(s3.dur,0))/3, 2) AS average_ride_duration
FROM months m
LEFTJOIN ride_stats s1 ON s1.m = m.m
LEFTJOIN ride_stats s2 ON s2.m = m.m+1LEFTJOIN ride_stats s3 ON s3.m = m.m+2ORDERBY m.m;
WITH months AS (
SELECT generate_series(1,10) AS m
),
ride_stats AS (
SELECTEXTRACT(MONTHFROM r.requested_at)::int AS m,
SUM(ar.ride_distance) AS dist,
SUM(ar.ride_duration) AS dur
FROM AcceptedRides ar
JOIN Rides r ON ar.ride_id = r.ride_id
WHEREEXTRACT(YEARFROM r.requested_at) =2020GROUPBYEXTRACT(MONTHFROM r.requested_at)
)
SELECT m.m ASmonth,
ROUND((COALESCE(s1.dist,0) + COALESCE(s2.dist,0) + COALESCE(s3.dist,0))/3.0, 2) AS average_ride_distance,
ROUND((COALESCE(s1.dur,0) + COALESCE(s2.dur,0) + COALESCE(s3.dur,0))/3.0, 2) AS average_ride_duration
FROM months m
LEFTJOIN ride_stats s1 ON s1.m = m.m
LEFTJOIN ride_stats s2 ON s2.m = m.m+1LEFTJOIN ride_stats s3 ON s3.m = m.m+2ORDERBY m.m;
1
2
3
4
5
6
7
8
9
10
11
12
13
defhopper_company_queries_3(rides, accepted):
import pandas as pd
rides = rides.copy()
rides['month'] = rides['requested_at'].dt.month
rides['year'] = rides['requested_at'].dt.year
merged = pd.merge(accepted, rides, on='ride_id')
stats = merged[merged['year'] ==2020].groupby('month').agg({'ride_distance':'sum', 'ride_duration':'sum'}).reindex(range(1,13), fill_value=0)
ans = []
for m in range(1, 11):
dist = stats.loc[m:m+2, 'ride_distance'].sum()/3 dur = stats.loc[m:m+2, 'ride_duration'].sum()/3 ans.append({'month': m, 'average_ride_distance': round(dist,2), 'average_ride_duration': round(dur,2)})
return pd.DataFrame(ans)