Problem

Table: Actions

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| post_id       | int     |
| action_date   | date    |
| action        | enum    |
| extra         | varchar |
+---------------+---------+
This table may have duplicate rows.
The action column is an ENUM (category) type of ('view', 'like', 'reaction', 'comment', 'report', 'share').
The extra column has optional information about the action, such as a reason for the report or a type of reaction.
extra is never NULL.

Write a solution to report the number of posts reported yesterday for each report reason. Assume today is 2019-07-05.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Input: 
Actions table:
+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra  |
+---------+---------+-------------+--------+--------+
| 1       | 1       | 2019-07-01  | view   | null   |
| 1       | 1       | 2019-07-01  | like   | null   |
| 1       | 1       | 2019-07-01  | share  | null   |
| 2       | 4       | 2019-07-04  | view   | null   |
| 2       | 4       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | view   | null   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 4       | 3       | 2019-07-02  | view   | null   |
| 4       | 3       | 2019-07-02  | report | spam   |
| 5       | 2       | 2019-07-04  | view   | null   |
| 5       | 2       | 2019-07-04  | report | racism |
| 5       | 5       | 2019-07-04  | view   | null   |
| 5       | 5       | 2019-07-04  | report | racism |
+---------+---------+-------------+--------+--------+
Output: 
+---------------+--------------+
| report_reason | report_count |
+---------------+--------------+
| spam          | 1            |
| racism        | 2            |
+---------------+--------------+
Explanation: Note that we only care about report reasons with non-zero number of reports.

Solution

Method 1 - Count Reports by Reason (SQL & Pandas)

Intuition

We need to count, for each report reason, how many posts were reported yesterday (2019-07-04). Only rows with action = ‘report’ and action_date = ‘2019-07-04’ matter. The ’extra’ column contains the report reason.

Approach

  1. Filter the Actions table for rows where action = 'report' and action_date = '2019-07-04'.
  2. Group by the extra column (the report reason).
  3. Count the number of unique post_id for each reason (as per the example, each post is counted once per reason, even if reported multiple times).
  4. Return the reason and the count.

Code

1
2
3
4
SELECT extra AS report_reason, COUNT(DISTINCT post_id) AS report_count
FROM Actions
WHERE action = 'report' AND action_date = '2019-07-04'
GROUP BY extra;
1
2
3
4
SELECT extra AS report_reason, COUNT(DISTINCT post_id) AS report_count
FROM Actions
WHERE action = 'report' AND action_date = '2019-07-04'
GROUP BY extra;
1
2
3
4
5
# actions is a pandas DataFrame
mask = (actions['action'] == 'report') & (actions['action_date'] == '2019-07-04')
df = actions[mask]
result = df.groupby('extra')['post_id'].nunique().reset_index()
result.columns = ['report_reason', 'report_count']

Complexity

  • ⏰ Time complexity: O(N), where N is the number of rows in Actions.
  • 🧺 Space complexity: O(R), where R is the number of unique report reasons for that day.