Problem

Table: Customers

+---------------------+---------+
| Column Name         | Type    |
+---------------------+---------+
| customer_id         | int     |
| customer_name       | varchar |
+---------------------+---------+
customer_id is the column with unique values for this table.
customer_name is the name of the customer.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| customer_id   | int     |
| product_name  | varchar |
+---------------+---------+
order_id is the column with unique values for this table.
customer_id is the id of the customer who bought the product "product_name".

Write a solution to report the customer_id and customer_name of customers who bought products " A", " B" but did not buy the product " C" since we want to recommend them to purchase this product.

Return the result table ordered by customer_id.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Input: 
Customers table:
+-------------+---------------+
| customer_id | customer_name |
+-------------+---------------+
| 1           | Daniel        |
| 2           | Diana         |
| 3           | Elizabeth     |
| 4           | Jhon          |
+-------------+---------------+
Orders table:
+------------+--------------+---------------+
| order_id   | customer_id  | product_name  |
+------------+--------------+---------------+
| 10         |     1        |     A         |
| 20         |     1        |     B         |
| 30         |     1        |     D         |
| 40         |     1        |     C         |
| 50         |     2        |     A         |
| 60         |     3        |     A         |
| 70         |     3        |     B         |
| 80         |     3        |     D         |
| 90         |     4        |     C         |
+------------+--------------+---------------+
Output: 
+-------------+---------------+
| customer_id | customer_name |
+-------------+---------------+
| 3           | Elizabeth     |
+-------------+---------------+
Explanation: Only the customer_id with id 3 bought the product A and B but not the product C.

Solution

Method 1 – SQL Group By and Having with Conditional Aggregation

Intuition

The key idea is to group orders by customer and use conditional aggregation to check if a customer bought both products “A” and “B” but not “C”. This can be done using SUM and CASE statements in SQL, or boolean logic in pandas.

Approach

  1. Join the Customers and Orders tables on customer_id.
  2. For each customer, use conditional aggregation to count purchases of “A”, “B”, and “C”.
  3. Select customers where the count for “A” and “B” is at least 1, and the count for “C” is 0.
  4. Return the result ordered by customer_id.

Code

1
2
3
4
5
6
7
8
SELECT c.customer_id, c.customer_name
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING SUM(o.product_name = 'A') > 0
   AND SUM(o.product_name = 'B') > 0
   AND SUM(o.product_name = 'C') = 0
ORDER BY c.customer_id;
1
2
3
4
5
6
7
8
SELECT c.customer_id, c.customer_name
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING SUM(CASE WHEN o.product_name = 'A' THEN 1 ELSE 0 END) > 0
   AND SUM(CASE WHEN o.product_name = 'B' THEN 1 ELSE 0 END) > 0
   AND SUM(CASE WHEN o.product_name = 'C' THEN 1 ELSE 0 END) = 0
ORDER BY c.customer_id;
1
2
3
4
5
6
7
8
9
def customers_bought_a_b_not_c(customers: 'pd.DataFrame', orders: 'pd.DataFrame') -> 'pd.DataFrame':
    merged = customers.merge(orders, on='customer_id')
    agg = merged.pivot_table(index=['customer_id', 'customer_name'],
                             columns='product_name',
                             values='order_id',
                             aggfunc='count',
                             fill_value=0).reset_index()
    res = agg[(agg.get('A', 0) > 0) & (agg.get('B', 0) > 0) & (agg.get('C', 0) == 0)]
    return res[['customer_id', 'customer_name']].sort_values('customer_id')

Complexity

  • ⏰ Time complexity: O(N), where N is the number of orders, as each order is processed once in the aggregation.
  • 🧺 Space complexity: O(C), where C is the number of customers, for storing the groupings.