+-------------+---------+
| Column Name | Type |
+-------------+---------+
| post_id | int |
| user_id | int |
| post_date | date |
+-------------+---------+
post_id is the primary key (column with unique values) for this table.
Each row of this table contains post_id, user_id, and post_date.
Write a solution to find users who demonstrate bursty behavior in their posting patterns during February 2024. Bursty behavior is defined as
any period of 7consecutive days where a user’s posting frequency is at least twice to their average weekly posting frequency for February 2024.
Note: Only include the dates from February 1 to February 28 in your analysis, which means you should count February as having exactly 4 weeks.
Return the result table orderd byuser_idinascendingorder.
Input:
Posts table:+---------+---------+------------+| post_id | user_id | post_date |+---------+---------+------------+|1|1|2024-02-27||2|5|2024-02-06||3|3|2024-02-25||4|3|2024-02-14||5|3|2024-02-06||6|2|2024-02-25|+---------+---------+------------+Output:
+---------+----------------+------------------+| user_id | max_7day_posts | avg_weekly_posts |+---------+----------------+------------------+|1|1|0.2500||2|1|0.2500||5|1|0.2500|+---------+----------------+------------------+Explanation:
***User 1:** Made only 1 post in February, resulting in an average of 0.25 posts per week and a max of 1 post in any 7-day period.***User 2:** Also made just 1 post,with the same average and max 7-day posting frequency as User 1.***User 5:** Like Users 1 and 2, User 5 made only 1 post throughout February, leading to the same average and max 7-day posting metrics.***User 3:** Although User 3 made more posts than the others(3 posts), they did not reach twice the average weekly posts in their consecutive 7-day window, so they are not listed in the output.**Note:** Output table is ordered by user_id in ascending order.
To detect bursty behavior, we need to compare a user’s posting frequency in any 7-day window to their average weekly posting frequency in February 2024. We can use window functions to count posts per user per day, then use a sliding window sum to get the number of posts in each 7-day period. If this sum is at least twice the average weekly frequency, the user is bursty.
WITH feb_posts AS (
SELECT user_id, post_date
FROM Posts
WHERE post_date BETWEEN'2024-02-01'AND'2024-02-28'),
user_total AS (
SELECT user_id, COUNT(*) AS total_posts
FROM feb_posts
GROUPBY user_id
),
daily_posts AS (
SELECT user_id, post_date, COUNT(*) AS cnt
FROM feb_posts
GROUPBY user_id, post_date
),
windowed AS (
SELECT d1.user_id, d1.post_date,
(
SELECTSUM(d2.cnt)
FROM daily_posts d2
WHERE d2.user_id = d1.user_id
AND d2.post_date BETWEEN DATE_SUB(d1.post_date, INTERVAL 6DAY) AND d1.post_date
) AS week_sum
FROM daily_posts d1
)
SELECTDISTINCT w.user_id
FROM windowed w
JOIN user_total u ON w.user_id = u.user_id
WHERE w.week_sum >=2* (u.total_posts /4)
ORDERBY w.user_id;
WITH feb_posts AS (
SELECT user_id, post_date
FROM Posts
WHERE post_date BETWEEN'2024-02-01'AND'2024-02-28'),
user_total AS (
SELECT user_id, COUNT(*) AS total_posts
FROM feb_posts
GROUPBY user_id
),
daily_posts AS (
SELECT user_id, post_date, COUNT(*) AS cnt
FROM feb_posts
GROUPBY user_id, post_date
),
windowed AS (
SELECT d1.user_id, d1.post_date,
(
SELECTSUM(d2.cnt)
FROM daily_posts d2
WHERE d2.user_id = d1.user_id
AND d2.post_date BETWEEN d1.post_date - INTERVAL '6 days'AND d1.post_date
) AS week_sum
FROM daily_posts d1
)
SELECTDISTINCT w.user_id
FROM windowed w
JOIN user_total u ON w.user_id = u.user_id
WHERE w.week_sum >=2* (u.total_posts /4)
ORDERBY w.user_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
deffind_bursty_behavior(posts: 'pd.DataFrame') ->'pd.DataFrame':
posts = posts.copy()
posts['post_date'] = pd.to_datetime(posts['post_date'])
mask = (posts['post_date'] >='2024-02-01') & (posts['post_date'] <='2024-02-28')
feb = posts.loc[mask]
if feb.empty:
return pd.DataFrame(columns=['user_id'])
user_total = feb.groupby('user_id').size().rename('total_posts')
daily = feb.groupby(['user_id', 'post_date']).size().rename('cnt').reset_index()
result = set()
for user, group in daily.groupby('user_id'):
group = group.set_index('post_date').asfreq('D', fill_value=0)
week_sum = group['cnt'].rolling(7, min_periods=1).sum()
avg_week = user_total[user] /4if (week_sum >=2* avg_week).any():
result.add(user)
return pd.DataFrame({'user_id': sorted(result)})