Problem

Table: ProductPurchases

+-------------+------+
| Column Name | Type |
+-------------+------+
| user_id     | int  |
| product_id  | int  |
| quantity    | int  |
+-------------+------+
(user_id, product_id) is the unique key for this table.
Each row represents a purchase of a product by a user in a specific quantity.

Table: ProductInfo

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| category    | varchar |
| price       | decimal |
+-------------+---------+
product_id is the primary key for this table.
Each row assigns a category and price to a product.

Amazon wants to implement the Customers who bought this also bought… feature based on co-purchase patterns. Write a solution to :

  1. Identify distinct product pairs frequently purchased together by the same customers (where product1_id < product2_id)
  2. For each product pair , determine how many customers purchased both products

A product pair is considered for recommendation if at least 3 different customers have purchased both products.

Return the _result table ordered bycustomer_count in descending order, and in case of a tie, by _product1_id _inascending order, and then by _product2_id inascending order.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Input:
ProductPurchases table:
+---------+------------+----------+
| user_id | product_id | quantity |
+---------+------------+----------+
| 1       | 101        | 2        |
| 1       | 102        | 1        |
| 1       | 103        | 3        |
| 2       | 101        | 1        |
| 2       | 102        | 5        |
| 2       | 104        | 1        |
| 3       | 101        | 2        |
| 3       | 103        | 1        |
| 3       | 105        | 4        |
| 4       | 101        | 1        |
| 4       | 102        | 1        |
| 4       | 103        | 2        |
| 4       | 104        | 3        |
| 5       | 102        | 2        |
| 5       | 104        | 1        |
+---------+------------+----------+
ProductInfo table:
+------------+-------------+-------+
| product_id | category    | price |
+------------+-------------+-------+
| 101        | Electronics | 100   |
| 102        | Books       | 20    |
| 103        | Clothing    | 35    |
| 104        | Kitchen     | 50    |
| 105        | Sports      | 75    |
+------------+-------------+-------+
Output:
+-------------+-------------+-------------------+-------------------+----------------+
| product1_id | product2_id | product1_category | product2_category | customer_count |
+-------------+-------------+-------------------+-------------------+----------------+
| 101         | 102         | Electronics       | Books             | 3              |
| 101         | 103         | Electronics       | Clothing          | 3              |
| 102         | 104         | Books             | Kitchen           | 3              |
+-------------+-------------+-------------------+-------------------+----------------+
Explanation:
* **Product pair (101, 102):**
* Purchased by users 1, 2, and 4 (3 customers)
* Product 101 is in Electronics category
* Product 102 is in Books category
* **Product pair (101, 103):**
* Purchased by users 1, 3, and 4 (3 customers)
* Product 101 is in Electronics category
* Product 103 is in Clothing category
* **Product pair (102, 104):**
* Purchased by users 2, 4, and 5 (3 customers)
* Product 102 is in Books category
* Product 104 is in Kitchen category
The result is ordered by customer_count in descending order. For pairs with
the same customer_count, they are ordered by product1_id and then product2_id
in ascending order.

Example 2:

Solution

Method 1 – Self Join and Grouping

Intuition

To recommend product pairs, we want to find all pairs of products from the same category that have been purchased by the same user. We can use a self-join on the purchases table, filter for same user and category, and ensure each pair is unique and ordered.

Approach

  1. Join ProductPurchases as a and b on a.user_id = b.user_id and a.product_id < b.product_id.
  2. Join both a and b to ProductInfo to ensure both products are in the same category.
  3. Filter for a.category = b.category.
  4. Group by a.product_id, b.product_id, and a.category to get unique pairs.
  5. Return the product pairs and their category, ordered by category, then product ids.

Code

1
2
3
4
5
6
7
8
SELECT a.product_id AS product_id1, b.product_id AS product_id2, a.category
FROM ProductPurchases a
JOIN ProductPurchases b ON a.user_id = b.user_id AND a.product_id < b.product_id
JOIN ProductInfo ai ON a.product_id = ai.product_id
JOIN ProductInfo bi ON b.product_id = bi.product_id
WHERE ai.category = bi.category
GROUP BY a.product_id, b.product_id, ai.category
ORDER BY ai.category, a.product_id, b.product_id;
1
2
3
4
5
6
7
8
SELECT a.product_id AS product_id1, b.product_id AS product_id2, a.category
FROM ProductPurchases a
JOIN ProductPurchases b ON a.user_id = b.user_id AND a.product_id < b.product_id
JOIN ProductInfo ai ON a.product_id = ai.product_id
JOIN ProductInfo bi ON b.product_id = bi.product_id
WHERE ai.category = bi.category
GROUP BY a.product_id, b.product_id, ai.category
ORDER BY ai.category, a.product_id, b.product_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class Solution:
    def find_product_recommendation_pairs(self, purchases, info):
        # purchases: DataFrame with columns user_id, product_id, quantity
        # info: DataFrame with columns product_id, category
        import pandas as pd
        df = purchases.merge(info, on='product_id')
        merged = df.merge(df, on='user_id')
        merged = merged[merged['product_id_x'] < merged['product_id_y']]
        merged = merged[merged['category_x'] == merged['category_y']]
        result = merged[['product_id_x', 'product_id_y', 'category_x']].drop_duplicates()
        result = result.rename(columns={'product_id_x': 'product_id1', 'product_id_y': 'product_id2', 'category_x': 'category'})
        result = result.sort_values(['category', 'product_id1', 'product_id2'])
        return result

Complexity

  • ⏰ Time complexity: O(n^2) in the worst case for the self-join, where n is the number of purchases, but typically much less due to grouping by user and category.
  • 🧺 Space complexity: O(k), where k is the number of unique product pairs output.