Problem

Table: Signups

1
2
3
4
5
6
7
8
+----------------+----------+
| Column Name | Type |
+----------------+----------+
| user_id | int |
| time_stamp | datetime |
+----------------+----------+
user_id is the column of unique values for this table.
Each row contains information about the signup time for the user with ID user_id.

Table: Confirmations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
+----------------+----------+
| Column Name | Type |
+----------------+----------+
| user_id | int |
| time_stamp | datetime |
| action | ENUM |
+----------------+----------+
(user_id, time_stamp) is the primary key (combination of columns with unique values) for this table.
user_id is a foreign key (reference column) to the Signups table.
action is an ENUM (category) of the type ('confirmed', 'timeout') Each row of this table indicates that the user with ID user_id requested a confirmation message at time_stamp and that confirmation message was either confirmed ('confirmed') or expired without confirming ('timeout').

The confirmation rate of a user is the number of 'confirmed' messages divided by the total number of requested confirmation messages. The confirmation rate of a user that did not request any confirmation messages is 0. Round the confirmation rate to two decimal places.

Write a solution to find the confirmation rate of each user.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Input: 
Signups table:
+---------+---------------------+
| user_id | time_stamp          |
+---------+---------------------+
| 3       | 2020-03-21 10:16:13 |
| 7       | 2020-01-04 13:57:59 |
| 2       | 2020-07-29 23:09:44 |
| 6       | 2020-12-09 10:39:37 |
+---------+---------------------+
Confirmations table:
+---------+---------------------+-----------+
| user_id | time_stamp          | action    |
+---------+---------------------+-----------+
| 3       | 2021-01-06 03:30:46 | timeout   |
| 3       | 2021-07-14 14:00:00 | timeout   |
| 7       | 2021-06-12 11:57:29 | confirmed |
| 7       | 2021-06-13 12:58:28 | confirmed |
| 7       | 2021-06-14 13:59:27 | confirmed |
| 2       | 2021-01-22 00:00:00 | confirmed |
| 2       | 2021-02-28 23:59:59 | timeout   |
+---------+---------------------+-----------+
Output: 
+---------+-------------------+
| user_id | confirmation_rate |
+---------+-------------------+
| 6       | 0.00              |
| 3       | 0.00              |
| 7       | 1.00              |
| 2       | 0.50              |
+---------+-------------------+
Explanation: 
User 6 did not request any confirmation messages. The confirmation rate is 0.
User 3 made 2 requests and both timed out. The confirmation rate is 0.
User 7 made 3 requests and all were confirmed. The confirmation rate is 1.
User 2 made 2 requests where one was confirmed and the other timed out. The confirmation rate is 1 / 2 = 0.5.

Solution

Method 1 – Aggregation and Join

Intuition

We need to compute the confirmation rate for each user, defined as the number of confirmed actions divided by the number of signups. This is a classic aggregation and join problem.

Approach

  1. For each user in the Signups table, count the number of signups (should be 1 per user).
  2. For each user in the Confirmations table, count the number of ‘confirmed’ actions.
  3. Left join Signups with Confirmations (filtered to ‘confirmed’) to ensure all users are included, even those with zero confirmations.
  4. For each user, calculate confirmation rate as (number of confirmed actions) / (number of signups).
  5. Return user_id and confirmation rate, ordered by user_id.

Code

1
2
3
4
5
6
7
SELECT s.user_id, 
       IFNULL(ROUND(COUNT(c.action) / 1, 2), 0) AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c
  ON s.user_id = c.user_id AND c.action = 'confirmed'
GROUP BY s.user_id
ORDER BY s.user_id;
1
2
3
4
5
6
7
SELECT s.user_id,
       COALESCE(ROUND(COUNT(c.action)::numeric / 1, 2), 0) AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c
  ON s.user_id = c.user_id AND c.action = 'confirmed'
GROUP BY s.user_id
ORDER BY s.user_id;
1
2
3
4
5
6
def confirmation_rate(signups: 'pd.DataFrame', confirmations: 'pd.DataFrame') -> 'pd.DataFrame':
    conf = confirmations[confirmations['action'] == 'confirmed']
    conf_count = conf.groupby('user_id').size().reset_index(name='confirmed')
    merged = signups[['user_id']].merge(conf_count, on='user_id', how='left').fillna(0)
    merged['confirmation_rate'] = (merged['confirmed'] / 1).round(2)
    return merged[['user_id', 'confirmation_rate']].sort_values('user_id')

Complexity

  • ⏰ Time complexity: O(n + m), where n is the number of signups and m is the number of confirmations. Each table is scanned once and joined.
  • 🧺 Space complexity: O(n), for storing the result per user.