Problem

Table: Customers

1
2
3
4
5
6
7
8
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+
id is the primary key column for this table.
Each row of this table indicates the ID and name of a customer.

Table: Orders

1
2
3
4
5
6
7
8
9
+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| customerId  | int  |
+-------------+------+
id is the primary key column for this table.
customerId is a foreign key of the ID from the Customers table.
Each row of this table indicates the ID of an order and the ID of the customer who ordered it.

Write an SQL query to report all customers who never order anything.

Examples

Example 1:

Input: Customers table:

1
2
3
4
5
6
7
8
+----+-------+
| id | name  |
+----+-------+
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |
+----+-------+

Orders table:

1
2
3
4
5
6
+----+------------+
| id | customerId |
+----+------------+
| 1  | 3          |
| 2  | 1          |
+----+------------+

Output:

1
2
3
4
5
6
+-----------+
| Customers |
+-----------+
| Henry     |
| Max       |
+-----------+

Solution

Method 1 - Using Not in

Code

1
2
SELECT c.name AS Customers FROM customers c
WHERE c.id NOT IN (SELECT o.customerId FROM Orders o)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    # Select the customers whose 'id' is not present in the orders DataFrame's 'customerId' column.
    df = customers[~customers['id'].isin(orders['customerId'])]

    # Build a DataFrame that only contains the 'name' column and rename it as 'Customers'.
    df = df[['name']].rename(columns={'name': 'Customers'})

    return df

Method 2 - Left Join 🏆

We need to verify if customerId is null because these will be the customers not present in the order table, which is what we are looking for.

Additionally, a right join will provide us with customers who have placed orders.

Using distinct on c.name is unnecessary since customers with identical names may have different IDs.

Code

1
2
3
SELECT c.name AS Customers FROM customers c
LEFT JOIN Orders o ON c.id = o.customerId
WHERE o.customerId IS NULL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    # Merge the customers DataFrame with the orders DataFrame using a left join on 'id' and 'customerId'
    merged_df = customers.merge(orders, how='left', left_on='id', right_on='customerId')
    
    # Use the 'customerId' column to create a boolean mask for customers who never placed any orders
    mask = merged_df['customerId'].isna()
    
    # Filter the rows using the boolean mask
    result_df = merged_df[mask]
    
    # Select only the 'name' column from the result DataFrame and rename it as 'Customers'
    result_df = result_df[['name']].rename(columns={'name': 'Customers'})
    
    return result_df