Problem

Table: Activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+
(player_id, event_date) is the primary key (column with unique values) of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.

Write a solution to report for each player and date, how many games played so far by the player. That is, the total number of games played by the player until that date. Check the example for clarity.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Input: 
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 1         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+
Output: 
+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1         | 2016-03-01 | 5                   |
| 1         | 2016-05-02 | 11                  |
| 1         | 2017-06-25 | 12                  |
| 3         | 2016-03-02 | 0                   |
| 3         | 2018-07-03 | 5                   |
+-----------+------------+---------------------+
Explanation: 
For the player with id 1, 5 + 6 = 11 games played by 2016-05-02, and 5 + 6 + 1 = 12 games played by 2017-06-25.
For the player with id 3, 0 + 5 = 5 games played by 2018-07-03.
Note that for each player we only care about the days when the player logged in.

Solution

Method 1 – Window Functions and Date Difference

Intuition

To find the first login day for each player and the number of games played on that day, we can use window functions to rank each player’s activity by date and select the earliest one.

Approach

  1. For each player, rank their activity records by event_date in ascending order.
  2. Select the record with the earliest event_date for each player.
  3. Output player_id and games_played for that day.

Code

1
2
3
4
5
6
SELECT player_id, games_played
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY player_id ORDER BY event_date) AS rn
  FROM Activity
) t
WHERE rn = 1;
1
2
3
4
5
6
SELECT player_id, games_played
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY player_id ORDER BY event_date) AS rn
  FROM Activity
) t
WHERE rn = 1;
1
2
3
4
5
6
class Solution:
    def game_play_analysis_iii(self, activity: 'pd.DataFrame') -> 'pd.DataFrame':
        import pandas as pd
        df = activity.sort_values(['player_id', 'event_date'])
        first = df.groupby('player_id', as_index=False).first()
        return first[['player_id', 'games_played']]

Complexity

  • ⏰ Time complexity: O(n log n), where n is the number of activity records, due to sorting.
  • 🧺 Space complexity: O(n), for storing intermediate results.