Find Bursty Behavior
Problem
Table: Posts
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| post_id | int |
| user_id | int |
| post_date | date |
+-------------+---------+
post_id is the primary key (column with unique values) for this table.
Each row of this table contains post_id, user_id, and post_date.
Write a solution to find users who demonstrate bursty behavior in their posting patterns during February 2024. Bursty behavior is defined as
any period of 7 consecutive days where a user's posting frequency is at least twice to their average weekly posting frequency for February 2024.
Note: Only include the dates from February 1 to February 28 in your analysis, which means you should count February as having exactly 4 weeks.
Return the result table orderd byuser_id inascending order.
The result format is in the following example.
Examples
Example 1:
Input:
Posts table:
+---------+---------+------------+
| post_id | user_id | post_date |
+---------+---------+------------+
| 1 | 1 | 2024-02-27 |
| 2 | 5 | 2024-02-06 |
| 3 | 3 | 2024-02-25 |
| 4 | 3 | 2024-02-14 |
| 5 | 3 | 2024-02-06 |
| 6 | 2 | 2024-02-25 |
+---------+---------+------------+
Output:
+---------+----------------+------------------+
| user_id | max_7day_posts | avg_weekly_posts |
+---------+----------------+------------------+
| 1 | 1 | 0.2500 |
| 2 | 1 | 0.2500 |
| 5 | 1 | 0.2500 |
+---------+----------------+------------------+
Explanation:
* **User 1:** Made only 1 post in February, resulting in an average of 0.25 posts per week and a max of 1 post in any 7-day period.
* **User 2:** Also made just 1 post, with the same average and max 7-day posting frequency as User 1.
* **User 5:** Like Users 1 and 2, User 5 made only 1 post throughout February, leading to the same average and max 7-day posting metrics.
* **User 3:** Although User 3 made more posts than the others (3 posts), they did not reach twice the average weekly posts in their consecutive 7-day window, so they are not listed in the output.
**Note:** Output table is ordered by user_id in ascending order.
Solution
Method 1 – SQL Window Functions and Sliding Window
Intuition
To detect bursty behavior, we need to compare a user's posting frequency in any 7-day window to their average weekly posting frequency in February 2024. We can use window functions to count posts per user per day, then use a sliding window sum to get the number of posts in each 7-day period. If this sum is at least twice the average weekly frequency, the user is bursty.
Approach
- Filter posts to only those in February 2024 (2024-02-01 to 2024-02-28).
- For each user, count the total number of posts in February (for average weekly frequency).
- For each user and day, count the number of posts on that day.
- For each user, use a sliding window (window function) to sum posts over each 7-day window.
- If the 7-day sum is at least twice the average weekly frequency, mark the user as bursty.
- Return unique user_ids in ascending order.
Code
MySQL
WITH feb_posts AS (
SELECT user_id, post_date
FROM Posts
WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
),
user_total AS (
SELECT user_id, COUNT(*) AS total_posts
FROM feb_posts
GROUP BY user_id
),
daily_posts AS (
SELECT user_id, post_date, COUNT(*) AS cnt
FROM feb_posts
GROUP BY user_id, post_date
),
windowed AS (
SELECT d1.user_id, d1.post_date,
(
SELECT SUM(d2.cnt)
FROM daily_posts d2
WHERE d2.user_id = d1.user_id
AND d2.post_date BETWEEN DATE_SUB(d1.post_date, INTERVAL 6 DAY) AND d1.post_date
) AS week_sum
FROM daily_posts d1
)
SELECT DISTINCT w.user_id
FROM windowed w
JOIN user_total u ON w.user_id = u.user_id
WHERE w.week_sum >= 2 * (u.total_posts / 4)
ORDER BY w.user_id;
PostgreSQL
WITH feb_posts AS (
SELECT user_id, post_date
FROM Posts
WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
),
user_total AS (
SELECT user_id, COUNT(*) AS total_posts
FROM feb_posts
GROUP BY user_id
),
daily_posts AS (
SELECT user_id, post_date, COUNT(*) AS cnt
FROM feb_posts
GROUP BY user_id, post_date
),
windowed AS (
SELECT d1.user_id, d1.post_date,
(
SELECT SUM(d2.cnt)
FROM daily_posts d2
WHERE d2.user_id = d1.user_id
AND d2.post_date BETWEEN d1.post_date - INTERVAL '6 days' AND d1.post_date
) AS week_sum
FROM daily_posts d1
)
SELECT DISTINCT w.user_id
FROM windowed w
JOIN user_total u ON w.user_id = u.user_id
WHERE w.week_sum >= 2 * (u.total_posts / 4)
ORDER BY w.user_id;
Python (pandas)
def find_bursty_behavior(posts: 'pd.DataFrame') -> 'pd.DataFrame':
posts = posts.copy()
posts['post_date'] = pd.to_datetime(posts['post_date'])
mask = (posts['post_date'] >= '2024-02-01') & (posts['post_date'] <= '2024-02-28')
feb = posts.loc[mask]
if feb.empty:
return pd.DataFrame(columns=['user_id'])
user_total = feb.groupby('user_id').size().rename('total_posts')
daily = feb.groupby(['user_id', 'post_date']).size().rename('cnt').reset_index()
result = set()
for user, group in daily.groupby('user_id'):
group = group.set_index('post_date').asfreq('D', fill_value=0)
week_sum = group['cnt'].rolling(7, min_periods=1).sum()
avg_week = user_total[user] / 4
if (week_sum >= 2 * avg_week).any():
result.add(user)
return pd.DataFrame({'user_id': sorted(result)})
Complexity
- ⏰ Time complexity:
O(n), where n is the number of posts in February 2024. Each post is processed a constant number of times. - 🧺 Space complexity:
O(n), for storing daily counts and window sums.