Problem
Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) is the primary key (combination of columns with unique values) of this table. This table shows the activity of players of some games. Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Write a solution to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.
The result format is in the following example.
Examples
Example 1:
Input: Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Output:
+-----------+
| fraction |
+-----------+
| 0.33 |
+-----------+
Explanation: Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33
Solution
Method 1 - Using common table expressions
To solve this problem in SQL, we need to piece together a few steps: find the first login date for each player, determine if they logged in the next day, and calculate the fraction of such players over the total number of players. Here’s how you can do it: Here is the approach:
- FirstLogins CTE:
- This Common Table Expression (CTE) retrieves the first login date for each player using
MIN(event_date)
and groups the results byplayer_id
.
- This Common Table Expression (CTE) retrieves the first login date for each player using
- LoggedInNextDay CTE:
- This CTE selects the
player_id
of players who have a login record exactly one day after their first login date. It joins theFirstLogins
CTE with theActivity
table and checks ifevent_date
is one day after thefirst_login_date
.
- This CTE selects the
- Main Query:
- The main query counts the unique
player_id
s in theLoggedInNextDay
CTE to determine the number of players who logged in on consecutive days. - It also counts the unique
player_id
s in theFirstLogins
CTE to get the total number of players. - The final selection rounds the fraction to two decimal places using
ROUND
.
- The main query counts the unique
Code
SQL
WITH FirstLogins AS (
SELECT
player_id,
MIN(event_date) AS first_login_date
FROM
Activity
GROUP BY
player_id
),
LoggedInNextDay AS (
SELECT DISTINCT
fl.player_id
FROM
FirstLogins fl
JOIN
Activity a ON fl.player_id = a.player_id
AND a.event_date = DATE_ADD(fl.first_login_date, INTERVAL 1 DAY)
)
SELECT
ROUND(COUNT(DISTINCT ln.player_id) / COUNT(DISTINCT fl.player_id), 2) AS fraction
FROM
FirstLogins fl
LEFT JOIN
LoggedInNextDay ln ON fl.player_id = ln.player_id;
Complexity
- ⏰ Time complexity:
O(n log n)
- FirstLogins CTE: Calculating the minimum date for each player involves a full scan and aggregation on the
event_date
, leading toO(n log n)
, wheren
is the number of records in theActivity
table. - LoggedInNextDay CTE: This involves another scan and join operation, resulting in
O(n log n)
due to the date addition and comparison for each record. - Main Query: Aggregating and joining large datasets typically has a combined complexity of
O(n log n)
with efficient indexing.
- FirstLogins CTE: Calculating the minimum date for each player involves a full scan and aggregation on the
- 🧺 Space complexity:
O(n)
- Intermediate Tables and Results: The CTEs and final result set might need additional space proportional to the number of unique
player_id
s, soO(n)
when considering intermediate results.
- Intermediate Tables and Results: The CTEs and final result set might need additional space proportional to the number of unique