Customer Purchasing Behavior Analysis
MediumUpdated: Sep 29, 2025
Practice on:
Problem
Table: Transactions
+------------------+---------+
| Column Name | Type |
+------------------+---------+
| transaction_id | int |
| customer_id | int |
| product_id | int |
| transaction_date | date |
| amount | decimal |
+------------------+---------+
transaction_id is the unique identifier for this table.
Each row of this table contains information about a transaction, including the customer ID, product ID, date, and amount spent.
Table: Products
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| product_id | int |
| category | varchar |
| price | decimal |
+-------------+---------+
product_id is the unique identifier for this table.
Each row of this table contains information about a product, including its category and price.
Write a solution to analyze customer purchasing behavior. For each customer , calculate:
- The total amount spent.
- The number of transactions.
- The number of unique product categories purchased.
- The average amount spent.
- The most frequently purchased product category (if there is a tie, choose the one with the most recent transaction).
- A loyalty score defined as: (Number of transactions * 10) + (Total amount spent / 100).
Round total_amount, avg_transaction_amount, and loyalty_score to 2
decimal places.
Return the result table ordered by loyalty_score indescending order, then bycustomer_id inascending order.
The query result format is in the following example.
Examples
Example 1:
Input:
`Transactions` table:
+----------------+-------------+------------+------------------+--------+
| transaction_id | customer_id | product_id | transaction_date | amount |
+----------------+-------------+------------+------------------+--------+
| 1 | 101 | 1 | 2023-01-01 | 100.00 |
| 2 | 101 | 2 | 2023-01-15 | 150.00 |
| 3 | 102 | 1 | 2023-01-01 | 100.00 |
| 4 | 102 | 3 | 2023-01-22 | 200.00 |
| 5 | 101 | 3 | 2023-02-10 | 200.00 |
+----------------+-------------+------------+------------------+--------+
`Products` table:
+------------+----------+--------+
| product_id | category | price |
+------------+----------+--------+
| 1 | A | 100.00 |
| 2 | B | 150.00 |
| 3 | C | 200.00 |
+------------+----------+--------+
Output:
+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+
| customer_id | total_amount | transaction_count | unique_categories | avg_transaction_amount | top_category | loyalty_score |
+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+
| 101 | 450.00 | 3 | 3 | 150.00 | C | 34.50 |
| 102 | 300.00 | 2 | 2 | 150.00 | C | 23.00 |
+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+
Explanation:
* For customer 101:
* Total amount spent: 100.00 + 150.00 + 200.00 = 450.00
* Number of transactions: 3
* Unique categories: A, B, C (3 categories)
* Average transaction amount: 450.00 / 3 = 150.00
* Top category: C (Customer 101 made 1 purchase each in categories A, B, and C. Since the count is the same for all categories, we choose the most recent transaction, which is category C on 2023-02-10)
* Loyalty score: (3 * 10) + (450.00 / 100) = 34.50
* For customer 102:
* Total amount spent: 100.00 + 200.00 = 300.00
* Number of transactions: 2
* Unique categories: A, C (2 categories)
* Average transaction amount: 300.00 / 2 = 150.00
* Top category: C (Customer 102 made 1 purchase each in categories A and C. Since the count is the same for both categories, we choose the most recent transaction, which is category C on 2023-01-22)
* Loyalty score: (2 * 10) + (300.00 / 100) = 23.00
**Note:** The output is ordered by loyalty_score in descending order, then by
customer_id in ascending order.
Solution
Method 1 – SQL Aggregation with Window Functions
Intuition
The key idea is to aggregate transaction data per customer, join with product categories, and use window functions to determine the most frequently purchased category, resolving ties by the most recent transaction date.
Approach
- Join
TransactionswithProductsto get category for each transaction. - For each customer:
- Calculate total amount spent, number of transactions, average transaction amount, and number of unique categories.
- For each (customer, category), count transactions and get the most recent transaction date.
- Use window functions to rank categories by frequency and recency.
- Select the top-ranked category as the most frequently purchased.
- Compute loyalty score as
(transactions * 10) + (total_amount / 100). - Round
total_amount,avg_transaction_amount, andloyalty_scoreto 2 decimal places.
- Order by
loyalty_scoredescending, thencustomer_idascending.
Code
MySQL
WITH cust_stats AS (
SELECT
t.customer_id,
ROUND(SUM(t.amount), 2) AS total_amount,
COUNT(*) AS num_transactions,
ROUND(AVG(t.amount), 2) AS avg_transaction_amount,
COUNT(DISTINCT p.category) AS unique_categories
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUP BY t.customer_id
),
cat_freq AS (
SELECT
t.customer_id,
p.category,
COUNT(*) AS freq,
MAX(t.transaction_date) AS last_date
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUP BY t.customer_id, p.category
),
cat_ranked AS (
SELECT
customer_id,
category,
freq,
last_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY freq DESC, last_date DESC) AS rn
FROM cat_freq
)
SELECT
cs.customer_id,
cs.total_amount,
cs.num_transactions,
cs.unique_categories,
cs.avg_transaction_amount,
cr.category AS most_frequent_category,
ROUND(cs.num_transactions * 10 + cs.total_amount / 100, 2) AS loyalty_score
FROM cust_stats cs
JOIN cat_ranked cr ON cs.customer_id = cr.customer_id AND cr.rn = 1
ORDER BY loyalty_score DESC, cs.customer_id ASC;
PostgreSQL
WITH cust_stats AS (
SELECT
t.customer_id,
ROUND(SUM(t.amount), 2) AS total_amount,
COUNT(*) AS num_transactions,
ROUND(AVG(t.amount), 2) AS avg_transaction_amount,
COUNT(DISTINCT p.category) AS unique_categories
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUP BY t.customer_id
),
cat_freq AS (
SELECT
t.customer_id,
p.category,
COUNT(*) AS freq,
MAX(t.transaction_date) AS last_date
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUP BY t.customer_id, p.category
),
cat_ranked AS (
SELECT
customer_id,
category,
freq,
last_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY freq DESC, last_date DESC) AS rn
FROM cat_freq
)
SELECT
cs.customer_id,
cs.total_amount,
cs.num_transactions,
cs.unique_categories,
cs.avg_transaction_amount,
cr.category AS most_frequent_category,
ROUND(cs.num_transactions * 10 + cs.total_amount / 100, 2) AS loyalty_score
FROM cust_stats cs
JOIN cat_ranked cr ON cs.customer_id = cr.customer_id AND cr.rn = 1
ORDER BY loyalty_score DESC, cs.customer_id ASC;
Python (pandas)
def customer_purchasing_behavior_analysis(transactions: 'pd.DataFrame', products: 'pd.DataFrame') -> 'pd.DataFrame':
import pandas as pd
df = transactions.merge(products, on='product_id')
# Aggregate stats
stats = df.groupby('customer_id').agg(
total_amount=('amount', lambda x: round(x.sum(), 2)),
num_transactions=('transaction_id', 'count'),
unique_categories=('category', 'nunique'),
avg_transaction_amount=('amount', lambda x: round(x.mean(), 2))
).reset_index()
# Category frequency and recency
cat = df.groupby(['customer_id', 'category']).agg(
freq=('transaction_id', 'count'),
last_date=('transaction_date', 'max')
).reset_index()
cat = cat.sort_values(['customer_id', 'freq', 'last_date'], ascending=[True, False, False])
most_freq = cat.groupby('customer_id').first().reset_index()[['customer_id', 'category']]
most_freq = most_freq.rename(columns={'category': 'most_frequent_category'})
# Merge
res = stats.merge(most_freq, on='customer_id')
res['loyalty_score'] = (res['num_transactions'] * 10 + res['total_amount'] / 100).round(2)
res = res.sort_values(['loyalty_score', 'customer_id'], ascending=[False, True])
return res
Complexity
- ⏰ Time complexity:
O(N), where N is the number of transactions, as each transaction is processed once in the join and aggregation. - 🧺 Space complexity:
O(C), where C is the number of customers, for storing the aggregates.