Problem

Table: Posts

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| post_id     | int     |
| user_id     | int     |
| post_date   | date    |
+-------------+---------+
post_id is the primary key (column with unique values) for this table.
Each row of this table contains post_id, user_id, and post_date.

Write a solution to find users who demonstrate bursty behavior in their posting patterns during February 2024. Bursty behavior is defined as any period of 7 consecutive days where a user’s posting frequency is at least twice to their average weekly posting frequency for February 2024.

Note: Only include the dates from February 1 to February 28 in your analysis, which means you should count February as having exactly 4 weeks.

Return the result table orderd byuser_id inascending order.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Input:
Posts table:
+---------+---------+------------+
| post_id | user_id | post_date  |
+---------+---------+------------+
| 1       | 1       | 2024-02-27 |
| 2       | 5       | 2024-02-06 |
| 3       | 3       | 2024-02-25 |
| 4       | 3       | 2024-02-14 |
| 5       | 3       | 2024-02-06 |
| 6       | 2       | 2024-02-25 |
+---------+---------+------------+
Output:
+---------+----------------+------------------+
| user_id | max_7day_posts | avg_weekly_posts |
+---------+----------------+------------------+
| 1       | 1              | 0.2500           |
| 2       | 1              | 0.2500           |
| 5       | 1              | 0.2500           |
+---------+----------------+------------------+
Explanation:
* **User 1:** Made only 1 post in February, resulting in an average of 0.25 posts per week and a max of 1 post in any 7-day period.
* **User 2:** Also made just 1 post, with the same average and max 7-day posting frequency as User 1.
* **User 5:** Like Users 1 and 2, User 5 made only 1 post throughout February, leading to the same average and max 7-day posting metrics.
* **User 3:** Although User 3 made more posts than the others (3 posts), they did not reach twice the average weekly posts in their consecutive 7-day window, so they are not listed in the output.
**Note:** Output table is ordered by user_id in ascending order.

Solution

Method 1 – SQL Window Functions and Sliding Window

Intuition

To detect bursty behavior, we need to compare a user’s posting frequency in any 7-day window to their average weekly posting frequency in February 2024. We can use window functions to count posts per user per day, then use a sliding window sum to get the number of posts in each 7-day period. If this sum is at least twice the average weekly frequency, the user is bursty.

Approach

  1. Filter posts to only those in February 2024 (2024-02-01 to 2024-02-28).
  2. For each user, count the total number of posts in February (for average weekly frequency).
  3. For each user and day, count the number of posts on that day.
  4. For each user, use a sliding window (window function) to sum posts over each 7-day window.
  5. If the 7-day sum is at least twice the average weekly frequency, mark the user as bursty.
  6. Return unique user_ids in ascending order.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
WITH feb_posts AS (
  SELECT user_id, post_date
  FROM Posts
  WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
),
user_total AS (
  SELECT user_id, COUNT(*) AS total_posts
  FROM feb_posts
  GROUP BY user_id
),
daily_posts AS (
  SELECT user_id, post_date, COUNT(*) AS cnt
  FROM feb_posts
  GROUP BY user_id, post_date
),
windowed AS (
  SELECT d1.user_id, d1.post_date,
    (
      SELECT SUM(d2.cnt)
      FROM daily_posts d2
      WHERE d2.user_id = d1.user_id
        AND d2.post_date BETWEEN DATE_SUB(d1.post_date, INTERVAL 6 DAY) AND d1.post_date
    ) AS week_sum
  FROM daily_posts d1
)
SELECT DISTINCT w.user_id
FROM windowed w
JOIN user_total u ON w.user_id = u.user_id
WHERE w.week_sum >= 2 * (u.total_posts / 4)
ORDER BY w.user_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
WITH feb_posts AS (
  SELECT user_id, post_date
  FROM Posts
  WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
),
user_total AS (
  SELECT user_id, COUNT(*) AS total_posts
  FROM feb_posts
  GROUP BY user_id
),
daily_posts AS (
  SELECT user_id, post_date, COUNT(*) AS cnt
  FROM feb_posts
  GROUP BY user_id, post_date
),
windowed AS (
  SELECT d1.user_id, d1.post_date,
    (
      SELECT SUM(d2.cnt)
      FROM daily_posts d2
      WHERE d2.user_id = d1.user_id
        AND d2.post_date BETWEEN d1.post_date - INTERVAL '6 days' AND d1.post_date
    ) AS week_sum
  FROM daily_posts d1
)
SELECT DISTINCT w.user_id
FROM windowed w
JOIN user_total u ON w.user_id = u.user_id
WHERE w.week_sum >= 2 * (u.total_posts / 4)
ORDER BY w.user_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
def find_bursty_behavior(posts: 'pd.DataFrame') -> 'pd.DataFrame':
    posts = posts.copy()
    posts['post_date'] = pd.to_datetime(posts['post_date'])
    mask = (posts['post_date'] >= '2024-02-01') & (posts['post_date'] <= '2024-02-28')
    feb = posts.loc[mask]
    if feb.empty:
        return pd.DataFrame(columns=['user_id'])
    user_total = feb.groupby('user_id').size().rename('total_posts')
    daily = feb.groupby(['user_id', 'post_date']).size().rename('cnt').reset_index()
    result = set()
    for user, group in daily.groupby('user_id'):
        group = group.set_index('post_date').asfreq('D', fill_value=0)
        week_sum = group['cnt'].rolling(7, min_periods=1).sum()
        avg_week = user_total[user] / 4
        if (week_sum >= 2 * avg_week).any():
            result.add(user)
    return pd.DataFrame({'user_id': sorted(result)})

Complexity

  • ⏰ Time complexity: O(n), where n is the number of posts in February 2024. Each post is processed a constant number of times.
  • 🧺 Space complexity: O(n), for storing daily counts and window sums.