Problem

Table: Traffic

1
2
3
4
5
6
7
8
9
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| activity      | enum    |
| activity_date | date    |
+---------------+---------+
This table may have duplicate rows.
The activity column is an ENUM (category) type of ('login', 'logout', 'jobs', 'groups', 'homepage').

Write a solution to reports for every date within at most 90 days from today, the number of users that logged in for the first time on that date. Assume today is 2019-06-30.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Input: 
Traffic table:
+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1       | login    | 2019-05-01    |
| 1       | homepage | 2019-05-01    |
| 1       | logout   | 2019-05-01    |
| 2       | login    | 2019-06-21    |
| 2       | logout   | 2019-06-21    |
| 3       | login    | 2019-01-01    |
| 3       | jobs     | 2019-01-01    |
| 3       | logout   | 2019-01-01    |
| 4       | login    | 2019-06-21    |
| 4       | groups   | 2019-06-21    |
| 4       | logout   | 2019-06-21    |
| 5       | login    | 2019-03-01    |
| 5       | logout   | 2019-03-01    |
| 5       | login    | 2019-06-21    |
| 5       | logout   | 2019-06-21    |
+---------+----------+---------------+
Output: 
+------------+-------------+
| login_date | user_count  |
+------------+-------------+
| 2019-05-01 | 1           |
| 2019-06-21 | 2           |
+------------+-------------+
Explanation: 
Note that we only care about dates with non zero user count.
The user with id 5 first logged in on 2019-03-01 so he's not counted on 2019-06-21.

Solution

Method 1 - First Login Aggregation with 90-Day Window

Intuition

For each user we only care about their earliest 'login' date. Once we compute that first-login date per user, we count how many users have that date within the last 90 days from 2019-06-30.

Approach

  1. Filter Traffic for activity = 'login'.
  2. For each user_id compute MIN(activity_date) as their first_login.
  3. Restrict first_login to the inclusive window from DATE_SUB('2019-06-30', INTERVAL 89 DAY) to '2019-06-30' (this is 90 days including endpoints).
  4. Group by first_login and count users.

Code

1
2
3
4
5
6
7
8
9
SELECT first_login AS login_date, COUNT(*) AS user_count
FROM (
  SELECT user_id, MIN(activity_date) AS first_login
  FROM Traffic
  WHERE activity = 'login'
  GROUP BY user_id
) t
WHERE first_login BETWEEN DATE_SUB('2019-06-30', INTERVAL 89 DAY) AND '2019-06-30'
GROUP BY first_login;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- PostgreSQL equivalent (use INTERVAL arithmetic)
SELECT first_login AS login_date, COUNT(*) AS user_count
FROM (
  SELECT user_id, MIN(activity_date) AS first_login
  FROM Traffic
  WHERE activity = 'login'
  GROUP BY user_id
) t
WHERE first_login BETWEEN (DATE '2019-06-30' - INTERVAL '89 days') AND DATE '2019-06-30'
GROUP BY first_login;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import pandas as pd

def first_login_counts(traffic: pd.DataFrame) -> pd.DataFrame:
    """Return DataFrame with columns ['login_date','user_count'] for first logins in the 90-day window.

    Assumes traffic['activity_date'] is a datetime-like column or string in ISO format.
    """
    # Ensure activity_date is datetime
    df = traffic.copy()
    df['activity_date'] = pd.to_datetime(df['activity_date'])

    # Filter to login activities and compute first login per user
    logins = df[df['activity'] == 'login']
    first_login = logins.groupby('user_id', as_index=False)['activity_date'].min()
    first_login.rename(columns={'activity_date': 'first_login'}, inplace=True)

    # Define window (inclusive)
    end = pd.to_datetime('2019-06-30')
    start = end - pd.Timedelta(days=89)

    # Filter to the 90-day window and count by date
    mask = (first_login['first_login'] >= start) & (first_login['first_login'] <= end)
    window = first_login.loc[mask].copy()
    window['login_date'] = window['first_login'].dt.date

    result = (
        window.groupby('login_date')
        .size()
        .reset_index(name='user_count')
        .sort_values('login_date')
        .reset_index(drop=True)
    )
    return result

Complexity

  • Time complexity: O(N) – We scan Traffic once to filter and compute per-user minimums, then aggregate over distinct users.
  • 🧺 Space complexity: O(U) – We store one row per distinct user (their first login) during aggregation.