Problem

Table: Submissions

+---------------+----------+
| Column Name   | Type     |
+---------------+----------+
| sub_id        | int      |
| parent_id     | int      |
+---------------+----------+
This table may have duplicate rows.
Each row can be a post or comment on the post.
parent_id is null for posts.
parent_id for comments is sub_id for another post in the table.

Write a solution to find the number of comments per post. The result table should contain post_id and its corresponding number_of_comments.

The Submissions table may contain duplicate comments. You should count the number of unique comments per post.

The Submissions table may contain duplicate posts. You should treat them as one post.

The result table should be ordered by post_id in ascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Input: 
Submissions table:
+---------+------------+
| sub_id  | parent_id  |
+---------+------------+
| 1       | Null       |
| 2       | Null       |
| 1       | Null       |
| 12      | Null       |
| 3       | 1          |
| 5       | 2          |
| 3       | 1          |
| 4       | 1          |
| 9       | 1          |
| 10      | 2          |
| 6       | 7          |
+---------+------------+
Output: 
+---------+--------------------+
| post_id | number_of_comments |
+---------+--------------------+
| 1       | 3                  |
| 2       | 2                  |
| 12      | 0                  |
+---------+--------------------+
Explanation: 
The post with id 1 has three comments in the table with id 3, 4, and 9. The comment with id 3 is repeated in the table, we counted it **only once**.
The post with id 2 has two comments in the table with id 5 and 10.
The post with id 12 has no comments in the table.
The comment with id 6 is a comment on a deleted post with id 7 so we ignored it.

Solution

Method 1 – SQL Group By and Join

Intuition

Posts are rows with parent_id IS NULL. Comments are rows with parent_id = sub_id of a post. Count unique comments per post, ignoring duplicates and comments on deleted posts.

Approach

  1. Select distinct post ids (parent_id IS NULL).
  2. Left join with distinct comments (parent_id = post_id).
  3. Count unique comment ids per post.
  4. Order by post_id ascending.

Code

1
2
3
4
5
6
SELECT p.sub_id AS post_id, COUNT(DISTINCT c.sub_id) AS number_of_comments
FROM (SELECT DISTINCT sub_id FROM Submissions WHERE parent_id IS NULL) p
LEFT JOIN (SELECT DISTINCT sub_id, parent_id FROM Submissions WHERE parent_id IS NOT NULL) c
ON p.sub_id = c.parent_id
GROUP BY p.sub_id
ORDER BY p.sub_id ASC;

Complexity

  • ⏰ Time complexity: O(N log N)
  • 🧺 Space complexity: O(N)