Problem

Table: Transactions

+------------------+---------+
| Column Name      | Type    |
+------------------+---------+
| transaction_id   | int     |
| customer_id      | int     |
| product_id       | int     |
| transaction_date | date    |
| amount           | decimal |
+------------------+---------+
transaction_id is the unique identifier for this table.
Each row of this table contains information about a transaction, including the customer ID, product ID, date, and amount spent.

Table: Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| category    | varchar |
| price       | decimal |
+-------------+---------+
product_id is the unique identifier for this table.
Each row of this table contains information about a product, including its category and price.

Write a solution to analyze customer purchasing behavior. For each customer , calculate:

  • The total amount spent.
  • The number of transactions.
  • The number of unique product categories purchased.
  • The average amount spent.
  • The most frequently purchased product category (if there is a tie, choose the one with the most recent transaction).
  • A loyalty score defined as: (Number of transactions * 10) + (Total amount spent / 100).

Round total_amount, avg_transaction_amount, and loyalty_score to 2 decimal places.

Return the result table ordered by loyalty_score indescending order, then bycustomer_id inascending order.

The query result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Input:
`Transactions` table:
+----------------+-------------+------------+------------------+--------+
| transaction_id | customer_id | product_id | transaction_date | amount |
+----------------+-------------+------------+------------------+--------+
| 1              | 101         | 1          | 2023-01-01       | 100.00 |
| 2              | 101         | 2          | 2023-01-15       | 150.00 |
| 3              | 102         | 1          | 2023-01-01       | 100.00 |
| 4              | 102         | 3          | 2023-01-22       | 200.00 |
| 5              | 101         | 3          | 2023-02-10       | 200.00 |
+----------------+-------------+------------+------------------+--------+
`Products` table:
+------------+----------+--------+
| product_id | category | price  |
+------------+----------+--------+
| 1          | A        | 100.00 |
| 2          | B        | 150.00 |
| 3          | C        | 200.00 |
+------------+----------+--------+
Output:
+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+
| customer_id | total_amount | transaction_count | unique_categories | avg_transaction_amount | top_category | loyalty_score |
+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+
| 101         | 450.00       | 3                 | 3                 | 150.00                 | C            | 34.50         |
| 102         | 300.00       | 2                 | 2                 | 150.00                 | C            | 23.00         |
+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+
Explanation:
* For customer 101: 
* Total amount spent: 100.00 + 150.00 + 200.00 = 450.00
* Number of transactions: 3
* Unique categories: A, B, C (3 categories)
* Average transaction amount: 450.00 / 3 = 150.00
* Top category: C (Customer 101 made 1 purchase each in categories A, B, and C. Since the count is the same for all categories, we choose the most recent transaction, which is category C on 2023-02-10)
* Loyalty score: (3 * 10) + (450.00 / 100) = 34.50
* For customer 102: 
* Total amount spent: 100.00 + 200.00 = 300.00
* Number of transactions: 2
* Unique categories: A, C (2 categories)
* Average transaction amount: 300.00 / 2 = 150.00
* Top category: C (Customer 102 made 1 purchase each in categories A and C. Since the count is the same for both categories, we choose the most recent transaction, which is category C on 2023-01-22)
* Loyalty score: (2 * 10) + (300.00 / 100) = 23.00
**Note:** The output is ordered by loyalty_score in descending order, then by
customer_id in ascending order.

Solution

Method 1 – SQL Aggregation with Window Functions

Intuition

The key idea is to aggregate transaction data per customer, join with product categories, and use window functions to determine the most frequently purchased category, resolving ties by the most recent transaction date.

Approach

  1. Join Transactions with Products to get category for each transaction.
  2. For each customer:
    • Calculate total amount spent, number of transactions, average transaction amount, and number of unique categories.
    • For each (customer, category), count transactions and get the most recent transaction date.
    • Use window functions to rank categories by frequency and recency.
    • Select the top-ranked category as the most frequently purchased.
    • Compute loyalty score as (transactions * 10) + (total_amount / 100).
    • Round total_amount, avg_transaction_amount, and loyalty_score to 2 decimal places.
  3. Order by loyalty_score descending, then customer_id ascending.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
WITH cust_stats AS (
  SELECT
    t.customer_id,
    ROUND(SUM(t.amount), 2) AS total_amount,
    COUNT(*) AS num_transactions,
    ROUND(AVG(t.amount), 2) AS avg_transaction_amount,
    COUNT(DISTINCT p.category) AS unique_categories
  FROM Transactions t
  JOIN Products p ON t.product_id = p.product_id
  GROUP BY t.customer_id
),
cat_freq AS (
  SELECT
    t.customer_id,
    p.category,
    COUNT(*) AS freq,
    MAX(t.transaction_date) AS last_date
  FROM Transactions t
  JOIN Products p ON t.product_id = p.product_id
  GROUP BY t.customer_id, p.category
),
cat_ranked AS (
  SELECT
    customer_id,
    category,
    freq,
    last_date,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY freq DESC, last_date DESC) AS rn
  FROM cat_freq
)
SELECT
  cs.customer_id,
  cs.total_amount,
  cs.num_transactions,
  cs.unique_categories,
  cs.avg_transaction_amount,
  cr.category AS most_frequent_category,
  ROUND(cs.num_transactions * 10 + cs.total_amount / 100, 2) AS loyalty_score
FROM cust_stats cs
JOIN cat_ranked cr ON cs.customer_id = cr.customer_id AND cr.rn = 1
ORDER BY loyalty_score DESC, cs.customer_id ASC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
WITH cust_stats AS (
  SELECT
    t.customer_id,
    ROUND(SUM(t.amount), 2) AS total_amount,
    COUNT(*) AS num_transactions,
    ROUND(AVG(t.amount), 2) AS avg_transaction_amount,
    COUNT(DISTINCT p.category) AS unique_categories
  FROM Transactions t
  JOIN Products p ON t.product_id = p.product_id
  GROUP BY t.customer_id
),
cat_freq AS (
  SELECT
    t.customer_id,
    p.category,
    COUNT(*) AS freq,
    MAX(t.transaction_date) AS last_date
  FROM Transactions t
  JOIN Products p ON t.product_id = p.product_id
  GROUP BY t.customer_id, p.category
),
cat_ranked AS (
  SELECT
    customer_id,
    category,
    freq,
    last_date,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY freq DESC, last_date DESC) AS rn
  FROM cat_freq
)
SELECT
  cs.customer_id,
  cs.total_amount,
  cs.num_transactions,
  cs.unique_categories,
  cs.avg_transaction_amount,
  cr.category AS most_frequent_category,
  ROUND(cs.num_transactions * 10 + cs.total_amount / 100, 2) AS loyalty_score
FROM cust_stats cs
JOIN cat_ranked cr ON cs.customer_id = cr.customer_id AND cr.rn = 1
ORDER BY loyalty_score DESC, cs.customer_id ASC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def customer_purchasing_behavior_analysis(transactions: 'pd.DataFrame', products: 'pd.DataFrame') -> 'pd.DataFrame':
    import pandas as pd
    df = transactions.merge(products, on='product_id')
    # Aggregate stats
    stats = df.groupby('customer_id').agg(
        total_amount=('amount', lambda x: round(x.sum(), 2)),
        num_transactions=('transaction_id', 'count'),
        unique_categories=('category', 'nunique'),
        avg_transaction_amount=('amount', lambda x: round(x.mean(), 2))
    ).reset_index()
    # Category frequency and recency
    cat = df.groupby(['customer_id', 'category']).agg(
        freq=('transaction_id', 'count'),
        last_date=('transaction_date', 'max')
    ).reset_index()
    cat = cat.sort_values(['customer_id', 'freq', 'last_date'], ascending=[True, False, False])
    most_freq = cat.groupby('customer_id').first().reset_index()[['customer_id', 'category']]
    most_freq = most_freq.rename(columns={'category': 'most_frequent_category'})
    # Merge
    res = stats.merge(most_freq, on='customer_id')
    res['loyalty_score'] = (res['num_transactions'] * 10 + res['total_amount'] / 100).round(2)
    res = res.sort_values(['loyalty_score', 'customer_id'], ascending=[False, True])
    return res

Complexity

  • ⏰ Time complexity: O(N), where N is the number of transactions, as each transaction is processed once in the join and aggregation.
  • 🧺 Space complexity: O(C), where C is the number of customers, for storing the aggregates.