Problem

Table: Users

1
2
3
4
5
6
7
8
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| user_id | int |
| user_name | varchar |
+-------------+---------+
user_id is the primary key (column with unique values) for this table.
Each row of this table contains the name and the id of a user.

Table: Register

1
2
3
4
5
6
7
8
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| contest_id | int |
| user_id | int |
+-------------+---------+
(contest_id, user_id) is the primary key (combination of columns with unique values) for this table.
Each row of this table contains the id of a user and the contest they registered into.

Write a solution to find the percentage of the users registered in each contest rounded to two decimals.

Return the result table ordered by percentage in descending order. In case of a tie, order it by contest_id in ascending order.

The result format is in the following example.

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Input: 
Users table:
+---------+-----------+
| user_id | user_name |
+---------+-----------+
| 6       | Alice     |
| 2       | Bob       |
| 7       | Alex      |
+---------+-----------+
Register table:
+------------+---------+
| contest_id | user_id |
+------------+---------+
| 215        | 6       |
| 209        | 2       |
| 208        | 2       |
| 210        | 6       |
| 208        | 6       |
| 209        | 7       |
| 209        | 6       |
| 215        | 7       |
| 208        | 7       |
| 210        | 2       |
| 207        | 2       |
| 210        | 7       |
+------------+---------+
Output: 
+------------+------------+
| contest_id | percentage |
+------------+------------+
| 208        | 100.0      |
| 209        | 100.0      |
| 210        | 100.0      |
| 215        | 66.67      |
| 207        | 33.33      |
+------------+------------+
Explanation: 
All the users registered in contests 208, 209, and 210. The percentage is 100% and we sort them in the answer table by contest_id in ascending order.
Alice and Alex registered in contest 215 and the percentage is ((2/3) * 100) = 66.67%
Bob registered in contest 207 and the percentage is ((1/3) * 100) = 33.33%

Solution

Method 1 - Count Distinct per Contest (Aggregation)

Intuition

For each contest, we need to find the percentage of all users who registered, rounded to two decimals. This is a classic aggregation and join problem: for each contest_id count distinct user_ids and divide by the total number of users (total_users).

Approach

Count the number of unique user_ids per contest_id, let total_users = total users from the Users table, compute percentage = COUNT(DISTINCT user_id) * 100.0 / total_users and round to two decimals. Sort by percentage descending and contest_id ascending.

Complexity

  • ⏰ Time complexity: O(N) where N is the number of registrations
  • 🧺 Space complexity: O(C) where C is the number of contests

Code

1
2
3
4
5
6
SELECT
  contest_id,
  ROUND(COUNT(DISTINCT user_id) * 100.0 / (SELECT COUNT(*) FROM Users), 2) AS percentage
FROM Register
GROUP BY contest_id
ORDER BY percentage DESC, contest_id ASC;
1
2
3
4
5
6
SELECT
  contest_id,
  ROUND(COUNT(DISTINCT user_id) * 100.0 / (SELECT COUNT(*) FROM Users), 2) AS percentage
FROM Register
GROUP BY contest_id
ORDER BY percentage DESC, contest_id ASC;
1
2
3
4
5
6
7
import pandas as pd
def percentage_of_users_attended(users: pd.DataFrame, register: pd.DataFrame) -> pd.DataFrame:
    total_users = users['user_id'].nunique()
    df = register.groupby('contest_id')['user_id'].nunique().reset_index()
    df['percentage'] = round(df['user_id'] * 100 / total_users, 2)
    df = df[['contest_id', 'percentage']]
    return df.sort_values(['percentage', 'contest_id'], ascending=[False, True]).reset_index(drop=True)