+---------------------+---------+
| Column Name | Type |
+---------------------+---------+
| session_id | int |
| duration | int |
+---------------------+---------+
session_id is the column of unique values for this table.
duration is the time in seconds that a user has visited the application.
You want to know how long a user visits your application. You decided to create bins of "[0-5>", "[5-10>", “[10-15>”, and "15 minutes or more"
and count the number of sessions on it.
Input:
Sessions table:+-------------+---------------+| session_id | duration |+-------------+---------------+|1|30||2|199||3|299||4|580||5|1000|+-------------+---------------+Output:
+--------------+--------------+| bin | total |+--------------+--------------+|[0-5>|3||[5-10>|1||[10-15>|0||15 or more |1|+--------------+--------------+Explanation:
For session_id 1,2, and 3 have a duration greater or equal than 0 minutes and less than 5 minutes.For session_id 4 has a duration greater or equal than 5 minutes and less than 10 minutes.There is no session with a duration greater than or equal to 10 minutes and less than 15 minutes.For session_id 5 has a duration greater than or equal to 15 minutes.
The key idea is to use SQL’s CASE statement to categorize each session’s duration into the correct bin, then count the number of sessions in each bin. This approach is efficient and ensures all bins are reported, even if some have zero sessions.
SELECT'bin', 'total'-- header for clarity
UNIONALLSELECT'[0-5>', COUNT(*) FROM Sessions WHERE duration >=0AND duration <300UNIONALLSELECT'[5-10>', COUNT(*) FROM Sessions WHERE duration >=300AND duration <600UNIONALLSELECT'[10-15>', COUNT(*) FROM Sessions WHERE duration >=600AND duration <900UNIONALLSELECT'15 or more', COUNT(*) FROM Sessions WHERE duration >=900;
1
2
3
4
5
6
7
SELECT'[0-5>'AS bin, COUNT(*) AS total FROM Sessions WHERE duration >=0AND duration <300UNIONALLSELECT'[5-10>'AS bin, COUNT(*) AS total FROM Sessions WHERE duration >=300AND duration <600UNIONALLSELECT'[10-15>'AS bin, COUNT(*) AS total FROM Sessions WHERE duration >=600AND duration <900UNIONALLSELECT'15 or more'AS bin, COUNT(*) AS total FROM Sessions WHERE duration >=900;
1
2
3
4
5
6
7
8
defcreate_session_bar_chart(sessions: 'pd.DataFrame') ->'pd.DataFrame':
import pandas as pd
bins = [0, 300, 600, 900, float('inf')]
labels = ['[0-5>', '[5-10>', '[10-15>', '15 or more']
sessions = sessions.copy()
sessions['bin'] = pd.cut(sessions['duration'], bins=bins, labels=labels, right=False)
res = sessions.groupby('bin').size().reindex(labels, fill_value=0).reset_index(name='total')
return res