+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| post_id | int |
| action_date | date |
| action | enum |
| extra | varchar |
+---------------+---------+
This table may have duplicate rows.
The action column is an ENUM (category) type of ('view', 'like', 'reaction', 'comment', 'report', 'share').
The extra column has optional information about the action, such as a reason for the report or a type of reaction.
Table: Removals
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| post_id | int |
| remove_date | date |
+---------------+---------+
post_id is the primary key (column with unique values) of this table.
Each row in this table indicates that some post was removed due to being reported or as a result of an admin review.
Write a solution to find the average daily percentage of posts that got removed after being reported as spam, rounded to 2 decimal places.
Input:
Actions table:+---------+---------+-------------+--------+--------+| user_id | post_id | action_date | action | extra |+---------+---------+-------------+--------+--------+|1|1|2019-07-01| view |null||1|1|2019-07-01| like |null||1|1|2019-07-01| share |null||2|2|2019-07-04| view |null||2|2|2019-07-04| report | spam ||3|4|2019-07-04| view |null||3|4|2019-07-04| report | spam ||4|3|2019-07-02| view |null||4|3|2019-07-02| report | spam ||5|2|2019-07-03| view |null||5|2|2019-07-03| report | racism ||5|5|2019-07-03| view |null||5|5|2019-07-03| report | racism |+---------+---------+-------------+--------+--------+Removals table:+---------+-------------+| post_id | remove_date |+---------+-------------+|2|2019-07-20||3|2019-07-18|+---------+-------------+Output:
+-----------------------+| average_daily_percent |+-----------------------+|75.00|+-----------------------+Explanation:
The percentage for2019-07-04is50% because only one post of two spam reported posts were removed.The percentage for2019-07-02is100% because one post was reported as spam and it was removed.The other days had no spam reports so the average is(50+100)/2=75%Note that the output is only one number and that we do not care about the remove dates.
We need to find, for each day, the percentage of posts reported as spam that were eventually removed, and then average these daily percentages. Only days with at least one spam report are considered. Removal date is irrelevant; we only care if the post was ever removed.
WITH spam_reports AS (
SELECT action_date, post_id
FROM Actions
WHERE action ='report'AND extra ='spam'GROUPBY action_date, post_id
),
daily_stats AS (
SELECT action_date,
COUNT(*) AS reported,
SUM(CASEWHEN r.post_id ISNOTNULLTHEN1ELSE0END) AS removed
FROM spam_reports s
LEFTJOIN Removals r ON s.post_id = r.post_id
GROUPBY action_date
)
SELECT ROUND(AVG(removed *100.0/ reported), 2) AS average_daily_percent
FROM daily_stats;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
WITH spam_reports AS (
SELECT action_date, post_id
FROM Actions
WHERE action ='report'AND extra ='spam'GROUPBY action_date, post_id
),
daily_stats AS (
SELECT action_date,
COUNT(*) AS reported,
SUM(CASEWHEN r.post_id ISNOTNULLTHEN1ELSE0END) AS removed
FROM spam_reports s
LEFTJOIN Removals r ON s.post_id = r.post_id
GROUPBY action_date
)
SELECT ROUND(AVG(removed *100.0/ reported)::numeric, 2) AS average_daily_percent
FROM daily_stats;