Problem#
Table: Transactions
+------------------+------+
| Column Name | Type |
+------------------+------+
| transaction_id | int |
| customer_id | int |
| transaction_date | date |
| amount | int |
+------------------+------+
transaction_id is the primary key of this table.
Each row contains information about transactions that includes unique (customer_id, transaction_date) along with the corresponding customer_id and amount.
Write an SQL query to find the customers who have made consecutive transactions with increasing amount
for at least three consecutive days.
Include the customer_id
, start date of the consecutive transactions period and the end date of the consecutive transactions period. There can be multiple consecutive transactions by a customer.
Return the result table ordered by customer_id, consecutive_start, consecutive_end
inascending order.
The query result format is in the following example.
Examples#
Example 1:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
Input:
Transactions table:
+----------------+-------------+------------------+--------+
| transaction_id | customer_id | transaction_date | amount |
+----------------+-------------+------------------+--------+
| 1 | 101 | 2023-05-01 | 100 |
| 2 | 101 | 2023-05-02 | 150 |
| 3 | 101 | 2023-05-03 | 200 |
| 4 | 102 | 2023-05-01 | 50 |
| 5 | 102 | 2023-05-03 | 100 |
| 6 | 102 | 2023-05-04 | 200 |
| 7 | 105 | 2023-05-01 | 100 |
| 8 | 105 | 2023-05-02 | 150 |
| 9 | 105 | 2023-05-03 | 200 |
| 10 | 105 | 2023-05-04 | 300 |
| 11 | 105 | 2023-05-12 | 250 |
| 12 | 105 | 2023-05-13 | 260 |
| 13 | 105 | 2023-05-14 | 270 |
+----------------+-------------+------------------+--------+
Output:
+-------------+-------------------+-----------------+
| customer_id | consecutive_start | consecutive_end |
+-------------+-------------------+-----------------+
| 101 | 2023-05-01 | 2023-05-03 |
| 105 | 2023-05-01 | 2023-05-04 |
| 105 | 2023-05-12 | 2023-05-14 |
+-------------+-------------------+-----------------+
Explanation:
- customer_id 101 has made consecutive transactions with increasing amounts from May 1st, 2023, to May 3rd, 2023
- customer_id 102 does not have any consecutive transactions for at least 3 days.
- customer_id 105 has two sets of consecutive transactions: from May 1st, 2023, to May 4th, 2023, and from May 12th, 2023, to May 14th, 2023.
customer_id is sorted in ascending order.
|
Solution#
Method 1 – Window Functions and Grouping#
Intuition#
We need to find customers who have at least three consecutive days of transactions with strictly increasing amounts. We can use window functions to assign a group to each consecutive sequence where the amount is increasing by 1 each day, then filter for groups of length at least 3.
Approach#
- For each customer, order their transactions by date.
- Use the LAG window function to compare each transaction’s amount and date with the previous one.
- Assign a group number that increments when the sequence breaks (either the date is not consecutive or the amount is not increasing).
- For each group, if the length is at least 3, output the customer_id, start date, and end date.
- Order the result by customer_id, consecutive_start, consecutive_end.
Code#
MySQL#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
WITH ordered_tx AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS rn
FROM Transactions
),
marked AS (
SELECT *,
LAG(transaction_date) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS prev_date,
LAG(amount) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS prev_amt
FROM ordered_tx
),
flags AS (
SELECT *,
CASE WHEN prev_date IS NULL OR DATEDIFF(transaction_date, prev_date) != 1 OR amount <= prev_amt THEN 1 ELSE 0 END AS new_grp
FROM marked
),
groups AS (
SELECT *,
SUM(new_grp) OVER (PARTITION BY customer_id ORDER BY transaction_date ROWS UNBOUNDED PRECEDING) AS grp
FROM flags
),
agg AS (
SELECT customer_id, grp,
MIN(transaction_date) AS consecutive_start,
MAX(transaction_date) AS consecutive_end,
COUNT(*) AS len
FROM groups
GROUP BY customer_id, grp
)
SELECT customer_id, consecutive_start, consecutive_end
FROM agg
WHERE len >= 3
ORDER BY customer_id, consecutive_start, consecutive_end;
|
PostgreSQL#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
WITH ordered_tx AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS rn
FROM Transactions
),
marked AS (
SELECT *,
LAG(transaction_date) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS prev_date,
LAG(amount) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS prev_amt
FROM ordered_tx
),
flags AS (
SELECT *,
CASE WHEN prev_date IS NULL OR transaction_date - prev_date != 1 OR amount <= prev_amt THEN 1 ELSE 0 END AS new_grp
FROM marked
),
groups AS (
SELECT *,
SUM(new_grp) OVER (PARTITION BY customer_id ORDER BY transaction_date ROWS UNBOUNDED PRECEDING) AS grp
FROM flags
),
agg AS (
SELECT customer_id, grp,
MIN(transaction_date) AS consecutive_start,
MAX(transaction_date) AS consecutive_end,
COUNT(*) AS len
FROM groups
GROUP BY customer_id, grp
)
SELECT customer_id, consecutive_start, consecutive_end
FROM agg
WHERE len >= 3
ORDER BY customer_id, consecutive_start, consecutive_end;
|
Python (Pandas)#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
def consecutive_increasing_transactions(transactions: 'pd.DataFrame') -> 'pd.DataFrame':
df = transactions.sort_values(['customer_id', 'transaction_date'])
df['prev_date'] = df.groupby('customer_id')['transaction_date'].shift(1)
df['prev_amt'] = df.groupby('customer_id')['amount'].shift(1)
df['new_grp'] = (df['prev_date'].isna() |
(df['transaction_date'] - df['prev_date']).dt.days != 1 |
(df['amount'] <= df['prev_amt'])).astype(int)
df['grp'] = df.groupby('customer_id')['new_grp'].cumsum()
res = df.groupby(['customer_id', 'grp']).agg(
consecutive_start=('transaction_date', 'min'),
consecutive_end=('transaction_date', 'max'),
len=('transaction_id', 'count')
).reset_index()
res = res[res['len'] >= 3]
return res[['customer_id', 'consecutive_start', 'consecutive_end']].sort_values(['customer_id', 'consecutive_start', 'consecutive_end'])
|
Complexity#
- ⏰ Time complexity:
O(n)
, where n is the number of transactions. Each transaction is processed a constant number of times.
- 🧺 Space complexity:
O(n)
, for storing group information and results.