+-------------+------+
| Column Name | Type |
+-------------+------+
| user_id | int |
| product_id | int |
| quantity | int |
+-------------+------+
(user_id, product_id) is the unique key for this table.
Each row represents a purchase of a product by a user in a specific quantity.
Table: ProductInfo
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| product_id | int |
| category | varchar |
| price | decimal |
+-------------+---------+
product_id is the primary key for this table.
Each row assigns a category and price to a product.
Amazon wants to implement the Customers who bought this also bought… feature based on co-purchase patterns. Write a solution to :
Identify distinct product pairs frequently purchased together by the same customers (where product1_id < product2_id)
For each product pair , determine how many customers purchased both products
A product pair is considered for recommendation ifat least3different customers have purchased both products.
Return the _result table ordered bycustomer_count in descending order, and in case of a tie, by _product1_id _inascending order, and then by _product2_idinascending order.
To recommend product pairs, we want to find all pairs of products from the same category that have been purchased by the same user. We can use a self-join on the purchases table, filter for same user and category, and ensure each pair is unique and ordered.
SELECT a.product_id AS product_id1, b.product_id AS product_id2, a.category
FROM ProductPurchases a
JOIN ProductPurchases b ON a.user_id = b.user_id AND a.product_id < b.product_id
JOIN ProductInfo ai ON a.product_id = ai.product_id
JOIN ProductInfo bi ON b.product_id = bi.product_id
WHERE ai.category = bi.category
GROUPBY a.product_id, b.product_id, ai.category
ORDERBY ai.category, a.product_id, b.product_id;
1
2
3
4
5
6
7
8
SELECT a.product_id AS product_id1, b.product_id AS product_id2, a.category
FROM ProductPurchases a
JOIN ProductPurchases b ON a.user_id = b.user_id AND a.product_id < b.product_id
JOIN ProductInfo ai ON a.product_id = ai.product_id
JOIN ProductInfo bi ON b.product_id = bi.product_id
WHERE ai.category = bi.category
GROUPBY a.product_id, b.product_id, ai.category
ORDERBY ai.category, a.product_id, b.product_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
classSolution:
deffind_product_recommendation_pairs(self, purchases, info):
# purchases: DataFrame with columns user_id, product_id, quantity# info: DataFrame with columns product_id, categoryimport pandas as pd
df = purchases.merge(info, on='product_id')
merged = df.merge(df, on='user_id')
merged = merged[merged['product_id_x'] < merged['product_id_y']]
merged = merged[merged['category_x'] == merged['category_y']]
result = merged[['product_id_x', 'product_id_y', 'category_x']].drop_duplicates()
result = result.rename(columns={'product_id_x': 'product_id1', 'product_id_y': 'product_id2', 'category_x': 'category'})
result = result.sort_values(['category', 'product_id1', 'product_id2'])
return result
⏰ Time complexity: O(n^2) in the worst case for the self-join, where n is the number of purchases, but typically much less due to grouping by user and category.
🧺 Space complexity: O(k), where k is the number of unique product pairs output.