Problem
Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) is the primary key of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Write an SQL query to report the first login date for each player.
Return the result table in any order.
Examples
Example 1:
Input: Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Output:
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+
Solution
Method 1 - Using Group By
Code
SQL
SELECT player_id, MIN(event_date) AS first_login
FROM Activity
GROUP BY player_id;
Pandas
import pandas as pd
def game_analysis(activity: pd.DataFrame) -> pd.DataFrame:
# Sort the DataFrame by player_id and event_date
activity = activity.sort_values(by=['player_id', 'event_date'])
# Group by player_id and select the minimum event_date for each player
result_df = activity.groupby('player_id')['event_date'].min().reset_index()
result_df.rename(columns={'event_date': 'first_login'}, inplace=True)
return result_df
Method 2 - Using Window Function
Code
SQL
SELECT DISTINCT player_id, FIRST_VALUE(event_date) OVER(PARTITION BY player_id ORDER BY event_date) first_login FROM Activity;