Problem

Table: Activities

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| activity_id   | int     |
| user_id       | int     |
| activity_type | enum    |
| time_spent    | decimal |
+---------------+---------+
activity_id is column of unique values for this table.
activity_type is an ENUM (category) type of ('send', 'open').
This table contains activity id, user id, activity type and time spent.

Table: Age

+-------------+------+
| Column Name | Type |
+-------------+------+
| user_id     | int  |
| age_bucket  | enum |
+-------------+------+
user_id is the column of unique values for this table.
age_bucket is an ENUM (category) type of ('21-25', '26-30', '31-35').
This table contains user id and age group.

Write a solution to calculate the percentage of the total time spent on sending and opening snaps for each age group. Precentage should be rounded to 2 decimal places.

Return the result table inany order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Input: 
Activities table:
+-------------+---------+---------------+------------+
| activity_id | user_id | activity_type | time_spent |
+-------------+---------+---------------+------------+
| 7274        | 123     | open          | 4.50       | 
| 2425        | 123     | send          | 3.50       | 
| 1413        | 456     | send          | 5.67       | 
| 2536        | 456     | open          | 3.00       | 
| 8564        | 456     | send          | 8.24       | 
| 5235        | 789     | send          | 6.24       | 
| 4251        | 123     | open          | 1.25       | 
| 1435        | 789     | open          | 5.25       | 
+-------------+---------+---------------+------------+
Age table:
+---------+------------+
| user_id | age_bucket | 
+---------+------------+
| 123     | 31-35      | 
| 789     | 21-25      | 
| 456     | 26-30      | 
+---------+------------+
Output: 
+------------+-----------+-----------+
| age_bucket | send_perc | open_perc |
+------------+-----------+-----------+
| 31-35      | 37.84     | 62.16     |
| 26-30      | 82.26     | 17.74     |
| 21-25      | 54.31     | 45.69     |
+------------+-----------+-----------+
Explanation: 
For age group 31-35:
- There is only one user belonging to this group with the user ID 123.
- The total time spent on sending snaps by this user is 3.50, and the time spent on opening snaps is 4.50 + 1.25 = 5.75.
- The overall time spent by this user is 3.50 + 5.75 = 9.25.
- Therefore, the sending snap percentage will be (3.50 / 9.25) * 100 = 37.84, and the opening snap percentage will be (5.75 / 9.25) * 100 = 62.16.
For age group 26-30: 
- There is only one user belonging to this group with the user ID 456. 
- The total time spent on sending snaps by this user is 5.67 + 8.24 = 13.91, and the time spent on opening snaps is 3.00. 
- The overall time spent by this user is 13.91 + 3.00 = 16.91. 
- Therefore, the sending snap percentage will be (13.91 / 16.91) * 100 = 82.26, and the opening snap percentage will be (3.00 / 16.91) * 100 = 17.74.
For age group 21-25: 
- There is only one user belonging to this group with the user ID 789. 
- The total time spent on sending snaps by this user is 6.24, and the time spent on opening snaps is 5.25. 
- The overall time spent by this user is 6.24 + 5.25 = 11.49. 
- Therefore, the sending snap percentage will be (6.24 / 11.49) * 100 = 54.31, and the opening snap percentage will be (5.25 / 11.49) * 100 = 45.69.
All percentages in output table rounded to the two decimal places.

Solution

Method 1 – Group By and Aggregate

Intuition

The main idea is to join the Activities and Age tables to associate each activity with its user’s age group, then aggregate the total time spent on each activity type per age group. By dividing the time spent on each activity type by the total time for the group, we get the required percentages.

Approach

  1. Join Activities and Age on user_id to get age group for each activity.
  2. For each age_bucket, sum the time_spent for ‘send’ and ‘open’ activities separately.
  3. For each age_bucket, calculate the total time spent (send + open).
  4. Compute the percentage for each activity type: (activity_time / total_time) * 100, rounded to 2 decimal places.
  5. Return the result with columns: age_bucket, send_perc, open_perc.

Code

1
2
3
4
5
6
7
SELECT
  a.age_bucket,
  ROUND(SUM(CASE WHEN b.activity_type = 'send' THEN b.time_spent ELSE 0 END) / SUM(b.time_spent) * 100, 2) AS send_perc,
  ROUND(SUM(CASE WHEN b.activity_type = 'open' THEN b.time_spent ELSE 0 END) / SUM(b.time_spent) * 100, 2) AS open_perc
FROM Age a
JOIN Activities b ON a.user_id = b.user_id
GROUP BY a.age_bucket;
1
2
3
4
5
6
7
SELECT
  a.age_bucket,
  ROUND(SUM(CASE WHEN b.activity_type = 'send' THEN b.time_spent ELSE 0 END) / SUM(b.time_spent) * 100.0, 2) AS send_perc,
  ROUND(SUM(CASE WHEN b.activity_type = 'open' THEN b.time_spent ELSE 0 END) / SUM(b.time_spent) * 100.0, 2) AS open_perc
FROM Age a
JOIN Activities b ON a.user_id = b.user_id
GROUP BY a.age_bucket;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Solution:
    def snaps_analysis(self, activities: 'pd.DataFrame', age: 'pd.DataFrame') -> 'pd.DataFrame':
        df = activities.merge(age, on='user_id')
        grp = df.groupby(['age_bucket', 'activity_type'])['time_spent'].sum().unstack(fill_value=0)
        tot = grp.sum(axis=1)
        ans = pd.DataFrame({
            'age_bucket': grp.index,
            'send_perc': (grp.get('send', 0) / tot * 100).round(2),
            'open_perc': (grp.get('open', 0) / tot * 100).round(2)
        })
        return ans

Complexity

  • ⏰ Time complexity: O(n), where n is the number of rows in the Activities table. Each row is processed once for the join and aggregation.
  • 🧺 Space complexity: O(k), where k is the number of unique age buckets (groups), since we store aggregated results per group.