Problem

Table: Traffic

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| activity      | enum    |
| activity_date | date    |
+---------------+---------+
This table may have duplicate rows.
The activity column is an ENUM (category) type of ('login', 'logout', 'jobs', 'groups', 'homepage').

Write a solution to reports for every date within at most 90 days from today, the number of users that logged in for the first time on that date. Assume today is 2019-06-30.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Input: 
Traffic table:
+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1       | login    | 2019-05-01    |
| 1       | homepage | 2019-05-01    |
| 1       | logout   | 2019-05-01    |
| 2       | login    | 2019-06-21    |
| 2       | logout   | 2019-06-21    |
| 3       | login    | 2019-01-01    |
| 3       | jobs     | 2019-01-01    |
| 3       | logout   | 2019-01-01    |
| 4       | login    | 2019-06-21    |
| 4       | groups   | 2019-06-21    |
| 4       | logout   | 2019-06-21    |
| 5       | login    | 2019-03-01    |
| 5       | logout   | 2019-03-01    |
| 5       | login    | 2019-06-21    |
| 5       | logout   | 2019-06-21    |
+---------+----------+---------------+
Output: 
+------------+-------------+
| login_date | user_count  |
+------------+-------------+
| 2019-05-01 | 1           |
| 2019-06-21 | 2           |
+------------+-------------+
Explanation: 
Note that we only care about dates with non zero user count.
The user with id 5 first logged in on 2019-03-01 so he's not counted on 2019-06-21.

Solution

Method 1 -

Intuition

We need to count, for each date, the number of users whose first login was on that date, within the last 90 days from ‘2019-06-30’.

Approach

Select only ’login’ activities. For each user, find their earliest login date. Then, count users grouped by that date, restricting to dates within the last 90 days from ‘2019-06-30’.

Code

1
2
3
4
5
6
7
8
9
SELECT first_login AS login_date, COUNT(*) AS user_count
FROM (
  SELECT user_id, MIN(activity_date) AS first_login
  FROM Traffic
  WHERE activity = 'login'
  GROUP BY user_id
) t
WHERE first_login BETWEEN DATE_SUB('2019-06-30', INTERVAL 89 DAY) AND '2019-06-30'
GROUP BY first_login;
1
2
3
4
5
# Assuming traffic is a pandas DataFrame
logins = traffic[traffic['activity'] == 'login']
first_login = logins.groupby('user_id')['activity_date'].min()
mask = (first_login >= '2019-04-02') & (first_login <= '2019-06-30')
result = first_login[mask].value_counts().sort_index()
1
2
String sql = "SELECT first_login AS login_date, COUNT(*) AS user_count FROM (SELECT user_id, MIN(activity_date) AS first_login FROM Traffic WHERE activity = 'login' GROUP BY user_id) t WHERE first_login BETWEEN DATE_SUB('2019-06-30', INTERVAL 89 DAY) AND '2019-06-30' GROUP BY first_login";
// Execute and fetch result

Complexity

  • ⏰ Time complexity: O(N) where N = number of rows in Traffic (due to group by and filter).
  • 🧺 Space complexity: O(D) where D = number of distinct login dates.