Problem

Table: Transactions

+------------------+------+
| Column Name      | Type |
+------------------+------+
| transaction_id   | int  |
| customer_id      | int  |
| transaction_date | date |
| amount           | int  |
+------------------+------+
transaction_id is the column with unique values of this table.
Each row contains information about transactions that includes unique (customer_id, transaction_date) along with the corresponding customer_id and amount.

Write a solution to find all customer_id who made the maximum number of transactions on consecutive days.

Return all customer_id with the maximum number of consecutive transactions. Order the result table by customer_id in ascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Input: 
Transactions table:
+----------------+-------------+------------------+--------+
| transaction_id | customer_id | transaction_date | amount |
+----------------+-------------+------------------+--------+
| 1              | 101         | 2023-05-01       | 100    |
| 2              | 101         | 2023-05-02       | 150    |
| 3              | 101         | 2023-05-03       | 200    |
| 4              | 102         | 2023-05-01       | 50     |
| 5              | 102         | 2023-05-03       | 100    |
| 6              | 102         | 2023-05-04       | 200    |
| 7              | 105         | 2023-05-01       | 100    |
| 8              | 105         | 2023-05-02       | 150    |
| 9              | 105         | 2023-05-03       | 200    |
+----------------+-------------+------------------+--------+
Output: 
+-------------+
| customer_id | 
+-------------+
| 101         | 
| 105         | 
+-------------+
Explanation: 
- customer_id 101 has a total of 3 transactions, and all of them are consecutive.
- customer_id 102 has a total of 3 transactions, but only 2 of them are consecutive. 
- customer_id 105 has a total of 3 transactions, and all of them are consecutive.
In total, the highest number of consecutive transactions is 3, achieved by customer_id 101 and 105. The customer_id are sorted in ascending order.

Solution

Method 1 – SQL Window Functions and Grouping

Intuition

The key idea is to use window functions to identify consecutive days for each customer. By assigning a row number to each transaction per customer and comparing it to the transaction date, we can group consecutive days together. The difference between the row number and the date (as an integer) will be constant for consecutive days, allowing us to group and count them.

Approach

  1. For each transaction, assign a row number (rn) partitioned by customer_id and ordered by transaction_date.
  2. Compute a group key as DATE_SUB(transaction_date, INTERVAL rn DAY) (or transaction_date - rn in PostgreSQL) to identify consecutive sequences.
  3. For each customer, group by this key and count the length of each consecutive sequence.
  4. Find the maximum length for each customer.
  5. Select customers whose maximum length equals the global maximum.
  6. Return the result ordered by customer_id ascending.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
WITH ranked AS (
  SELECT
    customer_id,
    transaction_date,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS rn
  FROM Transactions
),
consec AS (
  SELECT
    customer_id,
    DATE_SUB(transaction_date, INTERVAL rn DAY) AS grp,
    COUNT(*) AS len
  FROM ranked
  GROUP BY customer_id, grp
),
maxlen AS (
  SELECT customer_id, MAX(len) AS max_consec FROM consec GROUP BY customer_id
),
global_max AS (
  SELECT MAX(max_consec) AS gmax FROM maxlen
)
SELECT m.customer_id
FROM maxlen m, global_max g
WHERE m.max_consec = g.gmax
ORDER BY m.customer_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
WITH ranked AS (
  SELECT
    customer_id,
    transaction_date,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS rn
  FROM Transactions
),
consec AS (
  SELECT
    customer_id,
    transaction_date::date - rn * INTERVAL '1 day' AS grp,
    COUNT(*) AS len
  FROM ranked
  GROUP BY customer_id, grp
),
maxlen AS (
  SELECT customer_id, MAX(len) AS max_consec FROM consec GROUP BY customer_id
),
global_max AS (
  SELECT MAX(max_consec) AS gmax FROM maxlen
)
SELECT m.customer_id
FROM maxlen m, global_max g
WHERE m.max_consec = g.gmax
ORDER BY m.customer_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def customers_with_max_consecutive_transactions(transactions: 'pd.DataFrame') -> 'pd.DataFrame':
    import pandas as pd
    transactions = transactions.copy()
    transactions['transaction_date'] = pd.to_datetime(transactions['transaction_date'])
    transactions = transactions.sort_values(['customer_id', 'transaction_date'])
    transactions['rn'] = transactions.groupby('customer_id').cumcount()
    transactions['grp'] = transactions['transaction_date'] - pd.to_timedelta(transactions['rn'], unit='D')
    consec = transactions.groupby(['customer_id', 'grp']).size().reset_index(name='len')
    maxlen = consec.groupby('customer_id')['len'].max().reset_index(name='max_consec')
    gmax = maxlen['max_consec'].max()
    res = maxlen[maxlen['max_consec'] == gmax][['customer_id']].sort_values('customer_id')
    return res

Complexity

  • ⏰ Time complexity: O(N log N), where N is the number of transactions, due to sorting and grouping.
  • 🧺 Space complexity: O(C), where C is the number of customers, for storing groupings and results.