Reported Posts II
MediumUpdated: Aug 2, 2025
Practice on:
Problem
Table: Actions
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| post_id | int |
| action_date | date |
| action | enum |
| extra | varchar |
+---------------+---------+
This table may have duplicate rows.
The action column is an ENUM (category) type of ('view', 'like', 'reaction', 'comment', 'report', 'share').
The extra column has optional information about the action, such as a reason for the report or a type of reaction.
Table: Removals
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| post_id | int |
| remove_date | date |
+---------------+---------+
post_id is the primary key (column with unique values) of this table.
Each row in this table indicates that some post was removed due to being reported or as a result of an admin review.
Write a solution to find the average daily percentage of posts that got removed after being reported as spam, rounded to 2 decimal places.
The result format is in the following example.
Examples
Example 1:
Input:
Actions table:
+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra |
+---------+---------+-------------+--------+--------+
| 1 | 1 | 2019-07-01 | view | null |
| 1 | 1 | 2019-07-01 | like | null |
| 1 | 1 | 2019-07-01 | share | null |
| 2 | 2 | 2019-07-04 | view | null |
| 2 | 2 | 2019-07-04 | report | spam |
| 3 | 4 | 2019-07-04 | view | null |
| 3 | 4 | 2019-07-04 | report | spam |
| 4 | 3 | 2019-07-02 | view | null |
| 4 | 3 | 2019-07-02 | report | spam |
| 5 | 2 | 2019-07-03 | view | null |
| 5 | 2 | 2019-07-03 | report | racism |
| 5 | 5 | 2019-07-03 | view | null |
| 5 | 5 | 2019-07-03 | report | racism |
+---------+---------+-------------+--------+--------+
Removals table:
+---------+-------------+
| post_id | remove_date |
+---------+-------------+
| 2 | 2019-07-20 |
| 3 | 2019-07-18 |
+---------+-------------+
Output:
+-----------------------+
| average_daily_percent |
+-----------------------+
| 75.00 |
+-----------------------+
Explanation:
The percentage for 2019-07-04 is 50% because only one post of two spam reported posts were removed.
The percentage for 2019-07-02 is 100% because one post was reported as spam and it was removed.
The other days had no spam reports so the average is (50 + 100) / 2 = 75%
Note that the output is only one number and that we do not care about the remove dates.
Solution
Method 1 - Average Daily Percent of Spam-Reported Posts Removed (SQL & Pandas)
Intuition
We need to find, for each day, the percentage of posts reported as spam that were eventually removed, and then average these daily percentages. Only days with at least one spam report are considered. Removal date is irrelevant; we only care if the post was ever removed.
Approach
- For each day, find the set of unique posts reported as spam (action = 'report', extra = 'spam').
- For each day, count how many of those posts appear in the Removals table.
- For each day, compute the percentage: (removed / reported) * 100.
- Average these daily percentages and round to 2 decimal places.
Code
MySQL
WITH spam_reports AS (
SELECT action_date, post_id
FROM Actions
WHERE action = 'report' AND extra = 'spam'
GROUP BY action_date, post_id
),
daily_stats AS (
SELECT
action_date,
COUNT(*) AS reported,
SUM(CASE WHEN r.post_id IS NOT NULL THEN 1 ELSE 0 END) AS removed
FROM spam_reports s
LEFT JOIN Removals r ON s.post_id = r.post_id
GROUP BY action_date
)
SELECT ROUND(AVG(removed * 100.0 / reported), 2) AS average_daily_percent
FROM daily_stats;
PostgreSQL
WITH spam_reports AS (
SELECT action_date, post_id
FROM Actions
WHERE action = 'report' AND extra = 'spam'
GROUP BY action_date, post_id
),
daily_stats AS (
SELECT
action_date,
COUNT(*) AS reported,
SUM(CASE WHEN r.post_id IS NOT NULL THEN 1 ELSE 0 END) AS removed
FROM spam_reports s
LEFT JOIN Removals r ON s.post_id = r.post_id
GROUP BY action_date
)
SELECT ROUND(AVG(removed * 100.0 / reported)::numeric, 2) AS average_daily_percent
FROM daily_stats;
Python (pandas)
# actions and removals are pandas DataFrames
spam = actions[(actions['action'] == 'report') & (actions['extra'] == 'spam')]
spam = spam.drop_duplicates(['action_date', 'post_id'])
merged = spam.merge(removals[['post_id']], on='post_id', how='left', indicator='removed')
merged['removed'] = (merged['removed'] == 'both').astype(int)
daily = merged.groupby('action_date').agg(reported=('post_id', 'count'), removed=('removed', 'sum')).reset_index()
daily['percent'] = daily['removed'] * 100 / daily['reported']
average_daily_percent = round(daily['percent'].mean(), 2)
# To output as a DataFrame:
result = pd.DataFrame({'average_daily_percent': [average_daily_percent]})
Complexity
- ⏰ Time complexity: O(N + M), where N is the number of rows in Actions and M is the number of rows in Removals.
- 🧺 Space complexity: O(D + P), where D is the number of days with spam reports and P is the number of unique posts reported as spam.