Problem

Table: Sessions

+---------------+----------+
| Column Name   | Type     |
+---------------+----------+
| user_id       | int      |
| session_start | datetime |
| session_end   | datetime |
| session_id    | int      |
| session_type  | enum     |
+---------------+----------+
session_id is column of unique values for this table.
session_type is an ENUM (category) type of (Viewer, Streamer).
This table contains user id, session start, session end, session id and session type.

Write a solution to find the number of streaming sessions for users whose first session was as a viewer.

Return the result table ordered by count of streaming sessions, user_id indescending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Input: 
Sessions table:
+---------+---------------------+---------------------+------------+--------------+
| user_id | session_start       | session_end         | session_id | session_type | 
+---------+---------------------+---------------------+------------+--------------+
| 101     | 2023-11-06 13:53:42 | 2023-11-06 14:05:42 | 375        | Viewer       |  
| 101     | 2023-11-22 16:45:21 | 2023-11-22 20:39:21 | 594        | Streamer     |   
| 102     | 2023-11-16 13:23:09 | 2023-11-16 16:10:09 | 777        | Streamer     | 
| 102     | 2023-11-17 13:23:09 | 2023-11-17 16:10:09 | 778        | Streamer     | 
| 101     | 2023-11-20 07:16:06 | 2023-11-20 08:33:06 | 315        | Streamer     | 
| 104     | 2023-11-27 03:10:49 | 2023-11-27 03:30:49 | 797        | Viewer       | 
| 103     | 2023-11-27 03:10:49 | 2023-11-27 03:30:49 | 798        | Streamer     |  
+---------+---------------------+---------------------+------------+--------------+
Output: 
+---------+----------------+
| user_id | sessions_count | 
+---------+----------------+
| 101     | 2              | 
+---------+----------------+
**Explanation**
- user_id 101, initiated their initial session as a viewer on 2023-11-06 at 13:53:42, followed by two subsequent sessions as a Streamer, the count will be 2.
- user_id 102, although there are two sessions, the initial session was as a Streamer, so this user will be excluded.
- user_id 103 participated in only one session, which was as a Streamer, hence, it won't be considered.
- User_id 104 commenced their first session as a viewer but didn't have any subsequent sessions, therefore, they won't be included in the final count. 
Output table is ordered by sessions count and user_id in descending order.

Solution

Method 1 – SQL Window Functions and Filtering

Intuition

Find users whose first session was as a viewer, then count their streamer sessions.

Approach

  1. For each user, find the session_type of their earliest session (using window functions or subquery).
  2. Filter users whose first session was ‘Viewer’.
  3. Count their ‘Streamer’ sessions.
  4. Order by sessions_count and user_id descending.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
WITH first_session AS (
  SELECT user_id, session_type,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY session_start) AS rn
  FROM Sessions
)
SELECT s.user_id, COUNT(*) AS sessions_count
FROM Sessions s
JOIN (
  SELECT user_id
  FROM first_session
  WHERE rn = 1 AND session_type = 'Viewer'
) v ON s.user_id = v.user_id
WHERE s.session_type = 'Streamer'
GROUP BY s.user_id
HAVING sessions_count > 0
ORDER BY sessions_count DESC, s.user_id DESC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
WITH first_session AS (
  SELECT user_id, session_type,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY session_start) AS rn
  FROM Sessions
)
SELECT s.user_id, COUNT(*) AS sessions_count
FROM Sessions s
JOIN (
  SELECT user_id
  FROM first_session
  WHERE rn = 1 AND session_type = 'Viewer'
) v ON s.user_id = v.user_id
WHERE s.session_type = 'Streamer'
GROUP BY s.user_id
HAVING COUNT(*) > 0
ORDER BY sessions_count DESC, s.user_id DESC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import pandas as pd
def viewers_turned_streamers(sessions: pd.DataFrame) -> pd.DataFrame:
    # Find first session type for each user
    first = sessions.sort_values(['user_id', 'session_start']).groupby('user_id').first().reset_index()
    viewers = set(first[first['session_type'] == 'Viewer']['user_id'])
    # Count streamer sessions for these users
    filtered = sessions[(sessions['user_id'].isin(viewers)) & (sessions['session_type'] == 'Streamer')]
    result = filtered.groupby('user_id').size().reset_index(name='sessions_count')
    result = result[result['sessions_count'] > 0]
    result = result.sort_values(['sessions_count', 'user_id'], ascending=[False, False])
    return result

Complexity

  • ⏰ Time complexity: O(N) — Each row is processed a constant number of times.
  • 🧺 Space complexity: O(N) — For intermediate groupings.