+------------------+---------+
| Column Name | Type |
+------------------+---------+
| transaction_id | int |
| customer_id | int |
| product_id | int |
| transaction_date | date |
| amount | decimal |
+------------------+---------+
transaction_id is the unique identifier for this table.
Each row of this table contains information about a transaction, including the customer ID, product ID, date, and amount spent.
Table: Products
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| product_id | int |
| category | varchar |
| price | decimal |
+-------------+---------+
product_id is the unique identifier for this table.
Each row of this table contains information about a product, including its category and price.
Write a solution to analyze customer purchasing behavior. For each customer , calculate:
The total amount spent.
The number of transactions.
The number of unique product categories purchased.
The average amount spent.
The most frequently purchased product category (if there is a tie, choose the one with the most recent transaction).
A loyalty score defined as: (Number of transactions * 10) + (Total amount spent / 100).
Round total_amount, avg_transaction_amount, and loyalty_score to 2
decimal places.
Return the result table ordered byloyalty_scoreindescending order, then bycustomer_idinascending order.
The query result format is in the following example.
Input:
`Transactions` table:+----------------+-------------+------------+------------------+--------+| transaction_id | customer_id | product_id | transaction_date | amount |+----------------+-------------+------------+------------------+--------+|1|101|1|2023-01-01|100.00||2|101|2|2023-01-15|150.00||3|102|1|2023-01-01|100.00||4|102|3|2023-01-22|200.00||5|101|3|2023-02-10|200.00|+----------------+-------------+------------+------------------+--------+`Products` table:+------------+----------+--------+| product_id | category | price |+------------+----------+--------+|1| A |100.00||2| B |150.00||3| C |200.00|+------------+----------+--------+Output:
+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+| customer_id | total_amount | transaction_count | unique_categories | avg_transaction_amount | top_category | loyalty_score |+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+|101|450.00|3|3|150.00| C |34.50||102|300.00|2|2|150.00| C |23.00|+-------------+--------------+-------------------+-------------------+------------------------+--------------+---------------+Explanation:
* For customer 101:* Total amount spent:100.00+150.00+200.00=450.00* Number of transactions:3* Unique categories: A, B, C (3 categories)* Average transaction amount:450.00/3=150.00* Top category: C (Customer 101 made 1 purchase each in categories A, B, and C. Since the count is the same for all categories, we choose the most recent transaction, which is category C on 2023-02-10)* Loyalty score:(3*10)+(450.00/100)=34.50* For customer 102:* Total amount spent:100.00+200.00=300.00* Number of transactions:2* Unique categories: A, C (2 categories)* Average transaction amount:300.00/2=150.00* Top category: C (Customer 102 made 1 purchase each in categories A and C. Since the count is the same for both categories, we choose the most recent transaction, which is category C on 2023-01-22)* Loyalty score:(2*10)+(300.00/100)=23.00**Note:** The output is ordered by loyalty_score in descending order, then by
customer_id in ascending order.
The key idea is to aggregate transaction data per customer, join with product categories, and use window functions to determine the most frequently purchased category, resolving ties by the most recent transaction date.
WITH cust_stats AS (
SELECT t.customer_id,
ROUND(SUM(t.amount), 2) AS total_amount,
COUNT(*) AS num_transactions,
ROUND(AVG(t.amount), 2) AS avg_transaction_amount,
COUNT(DISTINCT p.category) AS unique_categories
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUPBY t.customer_id
),
cat_freq AS (
SELECT t.customer_id,
p.category,
COUNT(*) AS freq,
MAX(t.transaction_date) AS last_date
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUPBY t.customer_id, p.category
),
cat_ranked AS (
SELECT customer_id,
category,
freq,
last_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDERBY freq DESC, last_date DESC) AS rn
FROM cat_freq
)
SELECT cs.customer_id,
cs.total_amount,
cs.num_transactions,
cs.unique_categories,
cs.avg_transaction_amount,
cr.category AS most_frequent_category,
ROUND(cs.num_transactions *10+ cs.total_amount /100, 2) AS loyalty_score
FROM cust_stats cs
JOIN cat_ranked cr ON cs.customer_id = cr.customer_id AND cr.rn =1ORDERBY loyalty_score DESC, cs.customer_id ASC;
WITH cust_stats AS (
SELECT t.customer_id,
ROUND(SUM(t.amount), 2) AS total_amount,
COUNT(*) AS num_transactions,
ROUND(AVG(t.amount), 2) AS avg_transaction_amount,
COUNT(DISTINCT p.category) AS unique_categories
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUPBY t.customer_id
),
cat_freq AS (
SELECT t.customer_id,
p.category,
COUNT(*) AS freq,
MAX(t.transaction_date) AS last_date
FROM Transactions t
JOIN Products p ON t.product_id = p.product_id
GROUPBY t.customer_id, p.category
),
cat_ranked AS (
SELECT customer_id,
category,
freq,
last_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDERBY freq DESC, last_date DESC) AS rn
FROM cat_freq
)
SELECT cs.customer_id,
cs.total_amount,
cs.num_transactions,
cs.unique_categories,
cs.avg_transaction_amount,
cr.category AS most_frequent_category,
ROUND(cs.num_transactions *10+ cs.total_amount /100, 2) AS loyalty_score
FROM cust_stats cs
JOIN cat_ranked cr ON cs.customer_id = cr.customer_id AND cr.rn =1ORDERBY loyalty_score DESC, cs.customer_id ASC;