Problem

Table: Steps

+-------------+------+
| Column Name | Type |
+-------------+------+
| user_id     | int  |
| steps_count | int  |
| steps_date  | date |
+-------------+------+
(user_id, steps_date) is the primary key for this table.
Each row of this table contains user_id, steps_count, and steps_date.

Write a solution to calculate 3-day rolling averages of steps for each user.

We calculate the n-day rolling average this way:

  • For each day, we calculate the average of n consecutive days of step counts ending on that day if available, otherwise, n-day rolling average is not defined for it.

Output the user_id, steps_date, and rolling average. Round the rolling average to two decimal places.

Return the result table ordered byuser_id ,steps_date inascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Input: 
Steps table:
+---------+-------------+------------+
| user_id | steps_count | steps_date |
+---------+-------------+------------+
| 1       | 687         | 2021-09-02 |
| 1       | 395         | 2021-09-04 |
| 1       | 499         | 2021-09-05 |
| 1       | 712         | 2021-09-06 |
| 1       | 576         | 2021-09-07 |
| 2       | 153         | 2021-09-06 |
| 2       | 171         | 2021-09-07 |
| 2       | 530         | 2021-09-08 |
| 3       | 945         | 2021-09-04 |
| 3       | 120         | 2021-09-07 |
| 3       | 557         | 2021-09-08 |
| 3       | 840         | 2021-09-09 |
| 3       | 627         | 2021-09-10 |
| 5       | 382         | 2021-09-05 |
| 6       | 480         | 2021-09-01 |
| 6       | 191         | 2021-09-02 |
| 6       | 303         | 2021-09-05 |
+---------+-------------+------------+
Output: 
+---------+------------+-----------------+
| user_id | steps_date | rolling_average | 
+---------+------------+-----------------+
| 1       | 2021-09-06 | 535.33          | 
| 1       | 2021-09-07 | 595.67          | 
| 2       | 2021-09-08 | 284.67          |
| 3       | 2021-09-09 | 505.67          |
| 3       | 2021-09-10 | 674.67          |    
+---------+------------+-----------------+
Explanation: 
- For user id 1, the step counts for the three consecutive days up to 2021-09-06 are available. Consequently, the rolling average for this particular date is computed as (395 + 499 + 712) / 3 = 535.33.
- For user id 1, the step counts for the three consecutive days up to 2021-09-07 are available. Consequently, the rolling average for this particular date is computed as (499 + 712 + 576) / 3 = 595.67.
- For user id 2, the step counts for the three consecutive days up to 2021-09-08 are available. Consequently, the rolling average for this particular date is computed as (153 + 171 + 530) / 3 = 284.67.
- For user id 3, the step counts for the three consecutive days up to 2021-09-09 are available. Consequently, the rolling average for this particular date is computed as (120 + 557 + 840) / 3 = 505.67.
- For user id 3, the step counts for the three consecutive days up to 2021-09-10 are available. Consequently, the rolling average for this particular date is computed as (557 + 840 + 627) / 3 = 674.67.
- For user id 4 and 5, the calculation of the rolling average is not viable as there is insufficient data for the consecutive three days. Output table ordered by user_id and steps_date in ascending order.

Solution

Method 1 - Window Functions / Rolling Window

Intuition

To compute a 3-day rolling average for each user, we need to look at each user’s step counts ordered by date and, for each date, average the current and previous two days (if available). This is a classic use case for window functions in SQL and rolling windows in pandas.

Approach

SQL: Use the AVG() window function with a frame of 3 rows ending at the current row, partitioned by user and ordered by date. Only output rows where 3 days are available. Round the result to two decimal places.

Pandas: Sort by user and date, then use rolling(window=3) grouped by user to compute the mean, rounding to two decimals. Filter out rows where the rolling window is not full.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT
  user_id,
  steps_date,
  ROUND(AVG(steps_count) OVER (
    PARTITION BY user_id ORDER BY steps_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ), 2) AS rolling_average
FROM Steps
QUALIFY COUNT(*) OVER (
    PARTITION BY user_id ORDER BY steps_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) = 3
ORDER BY user_id, steps_date;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
SELECT
  user_id,
  steps_date,
  ROUND(AVG(steps_count) OVER (
    PARTITION BY user_id ORDER BY steps_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  )::numeric, 2) AS rolling_average
FROM (
  SELECT *,
    COUNT(*) OVER (
      PARTITION BY user_id ORDER BY steps_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS cnt
  FROM Steps
) t
WHERE cnt = 3
ORDER BY user_id, steps_date;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import pandas as pd

# Assume df is the Steps table as a pandas DataFrame
df = df.sort_values(['user_id', 'steps_date'])
df['rolling_average'] = (
    df.groupby('user_id')['steps_count']
      .rolling(window=3)
      .mean()
      .round(2)
      .reset_index(level=0, drop=True)
)
result = df[df.groupby('user_id').cumcount() >= 2][['user_id', 'steps_date', 'rolling_average']]
result = result.sort_values(['user_id', 'steps_date'])

Complexity

  • ⏰ Time complexity: O(n log n) (for sorting by user/date, then linear scan)
  • 🧺 Space complexity: O(n) (for storing intermediate results)