Problem

+--------------+---------+  
| Column Name  | Type    |  
+--------------+---------+  
| player_id    | int     |  
| device_id    | int     |  
| event_date   | date    |  
| games_played | int     |  
+--------------+---------+

(player_id, event_date) is the primary key of this table. This table shows the activity of players of some game. Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.

We define the install date of a player to be the first login day of that player.

We also define day 1 retention of some date X to be the number of players whose install date is X and they logged back in on the day right after X, divided by the number of players whose install date is X, rounded to 2 decimal places.

Write an SQL query that reports for each install date, the number of players that installed the game on that day and the day 1 retention.

Solution

Method 1 - Using join

Code

SQL
SELECT
a.event_date,
ROUND((COUNT(DISTINCT CASE WHEN a.event_date = b.install_date THEN a.player_id ELSE NULL END)/
COUNT(DISTINCT CASE WHEN DATEDIFF(a.event_date, b.install_date) = 1 THEN a.player_id ELSE NULL END)),2)
FROM
(SELECT a.player_id, a.event_date, b.install_date
FROM activity a
LEFT JOIN (SELECT player_id, MIN(event_date) install_date
FROM activity
GROUP BY player_id) b ON b.player_id = a.player_id) C
WHERE a.event_date = b.install_date OR DATEDIFF(a.event_date, b.install_date) = 1
GROUP BY a.event_date