Problem

Table: Accounts

1
2
3
4
5
6
7
8
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| name          | varchar |
+---------------+---------+
id is the primary key (column with unique values) for this table.
This table contains the account id and the user name of each account.

Table: Logins

1
2
3
4
5
6
7
8
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| login_date    | date    |
+---------------+---------+
This table may contain duplicate rows.
This table contains the account id of the user who logged in and the login date. A user may log in multiple times in the day.

Active users are those who logged in to their accounts for five or more consecutive days.

Write a solution to find the id and the name of active users.

Return the result table ordered by id.

The result format is in the following example.

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Input: 
Accounts table:
+----+----------+
| id | name     |
+----+----------+
| 1  | Winston  |
| 7  | Jonathan |
+----+----------+
Logins table:
+----+------------+
| id | login_date |
+----+------------+
| 7  | 2020-05-30 |
| 1  | 2020-05-30 |
| 7  | 2020-05-31 |
| 7  | 2020-06-01 |
| 7  | 2020-06-02 |
| 7  | 2020-06-02 |
| 7  | 2020-06-03 |
| 1  | 2020-06-07 |
| 7  | 2020-06-10 |
+----+------------+
Output: 
+----+----------+
| id | name     |
+----+----------+
| 7  | Jonathan |
+----+----------+
Explanation: 
User Winston with id = 1 logged in 2 times only in 2 different days, so, Winston is not an active user.
User Jonathan with id = 7 logged in 7 times in 6 different days, five of them were consecutive days, so, Jonathan is an active user.

Follow up

Could you write a general solution if the active users are those who logged in to their accounts for n or more consecutive days?

Solution

Method 1 -

Method 1 – Using Window Functions

Intuition

The key idea is to identify streaks of consecutive login dates for each user. By assigning a row number to each login date per user and comparing it to the login date, we can group consecutive days together. If any group has at least 5 consecutive days, that user is considered active.

Approach

  1. Remove duplicate login dates for each user, as multiple logins on the same day don’t count as extra.
  2. For each user, sort their login dates and assign a row number based on the order.
  3. Calculate the difference between the login date (as days since ‘1970-01-01’) and the row number. This difference will be the same for consecutive days, forming a group.
  4. Group by user and this difference, and count the number of days in each group.
  5. Select users who have any group with at least 5 consecutive days.
  6. Join with the Accounts table to get the user names.
  7. Order the result by user id.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
WITH DistinctLogins AS (
  SELECT DISTINCT id, login_date
  FROM Logins
),
LoginStreaks AS (
  SELECT
    id,
    login_date,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY login_date) AS rn
  FROM DistinctLogins
),
StreakGroups AS (
  SELECT
    id,
    DATE_SUB(login_date, INTERVAL rn DAY) AS grp
  FROM LoginStreaks
)
SELECT a.id, a.name
FROM Accounts a
JOIN (
  SELECT id
  FROM (
    SELECT id, COUNT(*) AS streak
    FROM StreakGroups
    GROUP BY id, grp
    HAVING streak >= 5
  ) t
) active
ON a.id = active.id
ORDER BY a.id;

Complexity

  • ⏰ Time complexity: O(N log N), where N is the number of login records (due to sorting per user for row numbering).
  • 🧺 Space complexity: O(N), for storing intermediate CTEs and groupings.