Problem

Table: Transactions

+------------------+----------+
| Column Name      | Type     |
+------------------+----------+
| user_id          | int      |
| spend            | decimal  |
| transaction_date | datetime |
+------------------+----------+
(user_id, transaction_date) is column of unique values for this table.
This table contains user_id, spend, and transaction_date.

Write a solution to find the third transaction(if they have at least three transactions) of every user, where the spending on the preceding two transactions is lower than the spending on the third transaction.

Return the result table byuser_id inascending order .

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Input: 
Transactions table:
+---------+--------+---------------------+
| user_id | spend  | transaction_date    | 
+---------+--------+---------------------+
| 1       | 65.56  | 2023-11-18 13:49:42 | 
| 1       | 96.0   | 2023-11-30 02:47:26 |     
| 1       | 7.44   | 2023-11-02 12:15:23 | 
| 1       | 49.78  | 2023-11-12 00:13:46 | 
| 2       | 40.89  | 2023-11-21 04:39:15 |  
| 2       | 100.44 | 2023-11-20 07:39:34 | 
| 3       | 37.33  | 2023-11-03 06:22:02 | 
| 3       | 13.89  | 2023-11-11 16:00:14 | 
| 3       | 7.0    | 2023-11-29 22:32:36 | 
+---------+--------+---------------------+
**Output**
+---------+-------------------------+------------------------+
| user_id | third_transaction_spend | third_transaction_date | 
+---------+-------------------------+------------------------+
| 1       | 65.56                   | 2023-11-18 13:49:42    |  
+---------+-------------------------+------------------------+
**Explanation**
- For user_id 1, their third transaction occurred on 2023-11-18 at 13:49:42 with an amount of $65.56, surpassing the expenditures of the previous two transactions which were $7.44 on 2023-11-02 at 12:15:23 and $49.78 on 2023-11-12 at 00:13:46. Thus, this third transaction will be included in the output table.
- user_id 2 only has a total of 2 transactions, so there isn't a third transaction to consider.
- For user_id 3, the amount of $7.0 for their third transaction is less than that of the preceding two transactions, so it won't be included.
Output table is ordered by user_id in ascending order.

Solution

Method 1 – SQL: Window Functions and Filtering

Intuition

We want to find, for each user, the third transaction (by date) where the previous two spends are both less than the third. We can use window functions to rank transactions and compare each transaction’s spend with the previous two.

Approach

  1. Use ROW_NUMBER() to order transactions for each user by transaction_date.
  2. For each transaction with row_number = 3 or more, use LAG() to get the spend of the previous two transactions.
  3. Filter for transactions where both previous spends are less than the current spend.
  4. Return user_id, spend, and transaction_date for these transactions, ordered by user_id ascending.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT user_id, spend, transaction_date
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS rn,
           LAG(spend, 1) OVER (PARTITION BY user_id ORDER BY transaction_date) AS prev1,
           LAG(spend, 2) OVER (PARTITION BY user_id ORDER BY transaction_date) AS prev2
    FROM Transactions
) t
WHERE rn = 3 AND prev1 < spend AND prev2 < spend
ORDER BY user_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT user_id, spend, transaction_date
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS rn,
           LAG(spend, 1) OVER (PARTITION BY user_id ORDER BY transaction_date) AS prev1,
           LAG(spend, 2) OVER (PARTITION BY user_id ORDER BY transaction_date) AS prev2
    FROM Transactions
) t
WHERE rn = 3 AND prev1 < spend AND prev2 < spend
ORDER BY user_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import pandas as pd

def find_third_transaction(transactions: pd.DataFrame) -> pd.DataFrame:
    df = transactions.copy()
    df = df.sort_values(['user_id', 'transaction_date'])
    df['rn'] = df.groupby('user_id').cumcount() + 1
    df['prev1'] = df.groupby('user_id')['spend'].shift(1)
    df['prev2'] = df.groupby('user_id')['spend'].shift(2)
    res = df[(df['rn'] == 3) & (df['prev1'] < df['spend']) & (df['prev2'] < df['spend'])]
    return res[['user_id', 'spend', 'transaction_date']].sort_values('user_id')

Complexity

  • ⏰ Time complexity: O(n log n) — For sorting and window functions.
  • 🧺 Space complexity: O(n) — For storing intermediate results.