Problem

Table: Transactions

+------------------+------+
| Column Name      | Type |
+------------------+------+
| transaction_id   | int  |
| customer_id      | int  |
| transaction_date | date |
| amount           | int  |
+------------------+------+
transaction_id is the primary key of this table.
Each row contains information about transactions that includes unique (customer_id, transaction_date) along with the corresponding customer_id and amount.

Write an SQL query to find the customers who have made consecutive transactions with increasing amount for at least three consecutive days. Include the customer_id, start date of the consecutive transactions period and the end date of the consecutive transactions period. There can be multiple consecutive transactions by a customer.

Return the result table ordered by customer_id, consecutive_start, consecutive_end inascending order.

The query result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Input:  
Transactions table:
+----------------+-------------+------------------+--------+
| transaction_id | customer_id | transaction_date | amount |
+----------------+-------------+------------------+--------+
| 1              | 101         | 2023-05-01       | 100    |
| 2              | 101         | 2023-05-02       | 150    |
| 3              | 101         | 2023-05-03       | 200    |
| 4              | 102         | 2023-05-01       | 50     |
| 5              | 102         | 2023-05-03       | 100    |
| 6              | 102         | 2023-05-04       | 200    |
| 7              | 105         | 2023-05-01       | 100    |
| 8              | 105         | 2023-05-02       | 150    |
| 9              | 105         | 2023-05-03       | 200    |
| 10             | 105         | 2023-05-04       | 300    |
| 11             | 105         | 2023-05-12       | 250    |
| 12             | 105         | 2023-05-13       | 260    |
| 13             | 105         | 2023-05-14       | 270    |
+----------------+-------------+------------------+--------+
Output:  
+-------------+-------------------+-----------------+
| customer_id | consecutive_start | consecutive_end | 
+-------------+-------------------+-----------------+
| 101         |  2023-05-01       | 2023-05-03      | 
| 105         |  2023-05-01       | 2023-05-04      |
| 105         |  2023-05-12       | 2023-05-14      | 
+-------------+-------------------+-----------------+
Explanation:  
- customer_id 101 has made consecutive transactions with increasing amounts from May 1st, 2023, to May 3rd, 2023
- customer_id 102 does not have any consecutive transactions for at least 3 days. 
- customer_id 105 has two sets of consecutive transactions: from May 1st, 2023, to May 4th, 2023, and from May 12th, 2023, to May 14th, 2023. 
customer_id is sorted in ascending order.

Solution

Method 1 – Window Functions and Grouping

Intuition

We need to find customers who have at least three consecutive days of transactions with strictly increasing amounts. We can use window functions to assign a group to each consecutive sequence where the amount is increasing by 1 each day, then filter for groups of length at least 3.

Approach

  1. For each customer, order their transactions by date.
  2. Use the LAG window function to compare each transaction’s amount and date with the previous one.
  3. Assign a group number that increments when the sequence breaks (either the date is not consecutive or the amount is not increasing).
  4. For each group, if the length is at least 3, output the customer_id, start date, and end date.
  5. Order the result by customer_id, consecutive_start, consecutive_end.

Code

MySQL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
WITH ordered_tx AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS rn
  FROM Transactions
),
marked AS (
  SELECT *,
         LAG(transaction_date) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS prev_date,
         LAG(amount) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS prev_amt
  FROM ordered_tx
),
flags AS (
  SELECT *,
         CASE WHEN prev_date IS NULL OR DATEDIFF(transaction_date, prev_date) != 1 OR amount <= prev_amt THEN 1 ELSE 0 END AS new_grp
  FROM marked
),
groups AS (
  SELECT *,
         SUM(new_grp) OVER (PARTITION BY customer_id ORDER BY transaction_date ROWS UNBOUNDED PRECEDING) AS grp
  FROM flags
),
agg AS (
  SELECT customer_id, grp,
         MIN(transaction_date) AS consecutive_start,
         MAX(transaction_date) AS consecutive_end,
         COUNT(*) AS len
  FROM groups
  GROUP BY customer_id, grp
)
SELECT customer_id, consecutive_start, consecutive_end
FROM agg
WHERE len >= 3
ORDER BY customer_id, consecutive_start, consecutive_end;
PostgreSQL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
WITH ordered_tx AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS rn
  FROM Transactions
),
marked AS (
  SELECT *,
         LAG(transaction_date) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS prev_date,
         LAG(amount) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS prev_amt
  FROM ordered_tx
),
flags AS (
  SELECT *,
         CASE WHEN prev_date IS NULL OR transaction_date - prev_date != 1 OR amount <= prev_amt THEN 1 ELSE 0 END AS new_grp
  FROM marked
),
groups AS (
  SELECT *,
         SUM(new_grp) OVER (PARTITION BY customer_id ORDER BY transaction_date ROWS UNBOUNDED PRECEDING) AS grp
  FROM flags
),
agg AS (
  SELECT customer_id, grp,
         MIN(transaction_date) AS consecutive_start,
         MAX(transaction_date) AS consecutive_end,
         COUNT(*) AS len
  FROM groups
  GROUP BY customer_id, grp
)
SELECT customer_id, consecutive_start, consecutive_end
FROM agg
WHERE len >= 3
ORDER BY customer_id, consecutive_start, consecutive_end;
Python (Pandas)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def consecutive_increasing_transactions(transactions: 'pd.DataFrame') -> 'pd.DataFrame':
    df = transactions.sort_values(['customer_id', 'transaction_date'])
    df['prev_date'] = df.groupby('customer_id')['transaction_date'].shift(1)
    df['prev_amt'] = df.groupby('customer_id')['amount'].shift(1)
    df['new_grp'] = (df['prev_date'].isna() |
                     (df['transaction_date'] - df['prev_date']).dt.days != 1 |
                     (df['amount'] <= df['prev_amt'])).astype(int)
    df['grp'] = df.groupby('customer_id')['new_grp'].cumsum()
    res = df.groupby(['customer_id', 'grp']).agg(
        consecutive_start=('transaction_date', 'min'),
        consecutive_end=('transaction_date', 'max'),
        len=('transaction_id', 'count')
    ).reset_index()
    res = res[res['len'] >= 3]
    return res[['customer_id', 'consecutive_start', 'consecutive_end']].sort_values(['customer_id', 'consecutive_start', 'consecutive_end'])

Complexity

  • ⏰ Time complexity: O(n), where n is the number of transactions. Each transaction is processed a constant number of times.
  • 🧺 Space complexity: O(n), for storing group information and results.