Problem

Table: Sessions

+---------------------+---------+
| Column Name         | Type    |
+---------------------+---------+
| session_id          | int     |
| duration            | int     |
+---------------------+---------+
session_id is the column of unique values for this table.
duration is the time in seconds that a user has visited the application.

You want to know how long a user visits your application. You decided to create bins of "[0-5>", "[5-10>", “[10-15>”, and "15 minutes or more" and count the number of sessions on it.

Write a solution to report the (bin, total).

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Input: 
Sessions table:
+-------------+---------------+
| session_id  | duration      |
+-------------+---------------+
| 1           | 30            |
| 2           | 199           |
| 3           | 299           |
| 4           | 580           |
| 5           | 1000          |
+-------------+---------------+
Output: 
+--------------+--------------+
| bin          | total        |
+--------------+--------------+
| [0-5>        | 3            |
| [5-10>       | 1            |
| [10-15>      | 0            |
| 15 or more   | 1            |
+--------------+--------------+
Explanation: 
For session_id 1, 2, and 3 have a duration greater or equal than 0 minutes and less than 5 minutes.
For session_id 4 has a duration greater or equal than 5 minutes and less than 10 minutes.
There is no session with a duration greater than or equal to 10 minutes and less than 15 minutes.
For session_id 5 has a duration greater than or equal to 15 minutes.

Solution

Method 1 – SQL Case Aggregation

Intuition

The key idea is to use SQL’s CASE statement to categorize each session’s duration into the correct bin, then count the number of sessions in each bin. This approach is efficient and ensures all bins are reported, even if some have zero sessions.

Approach

  1. Use a CASE statement to assign each session to a bin based on its duration (in seconds, so convert minutes to seconds).
  2. Group by the bin and count the number of sessions in each.
  3. Use UNION ALL to ensure all bins are present, even if their count is zero.

Code

1
2
3
4
5
6
7
8
9
SELECT 'bin', 'total' -- header for clarity
UNION ALL
SELECT '[0-5>', COUNT(*) FROM Sessions WHERE duration >= 0 AND duration < 300
UNION ALL
SELECT '[5-10>', COUNT(*) FROM Sessions WHERE duration >= 300 AND duration < 600
UNION ALL
SELECT '[10-15>', COUNT(*) FROM Sessions WHERE duration >= 600 AND duration < 900
UNION ALL
SELECT '15 or more', COUNT(*) FROM Sessions WHERE duration >= 900;
1
2
3
4
5
6
7
SELECT '[0-5>' AS bin, COUNT(*) AS total FROM Sessions WHERE duration >= 0 AND duration < 300
UNION ALL
SELECT '[5-10>' AS bin, COUNT(*) AS total FROM Sessions WHERE duration >= 300 AND duration < 600
UNION ALL
SELECT '[10-15>' AS bin, COUNT(*) AS total FROM Sessions WHERE duration >= 600 AND duration < 900
UNION ALL
SELECT '15 or more' AS bin, COUNT(*) AS total FROM Sessions WHERE duration >= 900;
1
2
3
4
5
6
7
8
def create_session_bar_chart(sessions: 'pd.DataFrame') -> 'pd.DataFrame':
    import pandas as pd
    bins = [0, 300, 600, 900, float('inf')]
    labels = ['[0-5>', '[5-10>', '[10-15>', '15 or more']
    sessions = sessions.copy()
    sessions['bin'] = pd.cut(sessions['duration'], bins=bins, labels=labels, right=False)
    res = sessions.groupby('bin').size().reindex(labels, fill_value=0).reset_index(name='total')
    return res

Complexity

  • ⏰ Time complexity: O(n), where n is the number of sessions, as each session is processed once.
  • 🧺 Space complexity: O(1) for SQL, O(n) for pandas (for the output DataFrame).