Problem
Table: Customers
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| name | varchar |
+-------------+---------+
id is the primary key column for this table.
Each row of this table indicates the ID and name of a customer.
Table: Orders
+-------------+------+
| Column Name | Type |
+-------------+------+
| id | int |
| customerId | int |
+-------------+------+
id is the primary key column for this table.
customerId is a foreign key of the ID from the Customers table.
Each row of this table indicates the ID of an order and the ID of the customer who ordered it.
Write an SQL query to report all customers who never order anything.
Examples
Example 1:
Input: Customers table:
+----+-------+
| id | name |
+----+-------+
| 1 | Joe |
| 2 | Henry |
| 3 | Sam |
| 4 | Max |
+----+-------+
Orders table:
+----+------------+
| id | customerId |
+----+------------+
| 1 | 3 |
| 2 | 1 |
+----+------------+
Output:
+-----------+
| Customers |
+-----------+
| Henry |
| Max |
+-----------+
Solution
Method 1 - Using Not in
Code
SQL
SELECT c.name AS Customers FROM customers c
WHERE c.id NOT IN (SELECT o.customerId FROM Orders o)
Pandas
df = customers[~customers['id'].isin(orders['customerId'])]
: This line filters the ‘customers’ DataFrame using boolean indexing. The expression~customers['id'].isin(orders['customerId'])
determines whether the ‘id’ in ‘customers’ is absent from the ‘orders’ DataFrame’s ‘customerId’ column. The tilde (~) operator negates the result, hence selecting rows where the ‘id’ does not appear in ‘customerId’.df = df[['name']].rename(columns={'name': 'Customers'})
: Post-filtering, a new DataFramedf
is generated that includes only the ’name’ column. The[['name']]
syntax retains ’name’ as a DataFrame, and the rename function changes the column label from ’name’ to ‘Customers’. Thus, the resulting DataFramedf
lists names of customers who have never placed orders, with the column labeled ‘Customers’.
import pandas as pd
def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
# Select the customers whose 'id' is not present in the orders DataFrame's 'customerId' column.
df = customers[~customers['id'].isin(orders['customerId'])]
# Build a DataFrame that only contains the 'name' column and rename it as 'Customers'.
df = df[['name']].rename(columns={'name': 'Customers'})
return df
Method 2 - Left Join 🏆
We need to verify if customerId
is null because these will be the customers not present in the order table, which is what we are looking for.
Additionally, a right join will provide us with customers who have placed orders.
Using distinct
on c.name
is unnecessary since customers with identical names may have different IDs.
Code
SQL
SELECT c.name AS Customers FROM customers c
LEFT JOIN Orders o ON c.id = o.customerId
WHERE o.customerId IS NULL
Pandas
- We first merge the ‘customers’ DataFrame with the ‘orders’ DataFrame using a left join. This allows us to have a DataFrame where each row represents a customer and includes any orders they may have placed.
- We create a boolean mask named mask by checking if the ‘customerId’ column in the merged DataFrame is null. This mask will be True for customers who never placed any orders and False for customers who placed orders.
- We use the boolean mask mask to filter the rows from the ‘customers’ DataFrame, selecting only those customers who never placed any orders.
- We select only the ’name’ column from the filtered DataFrame and rename it as ‘Customers’.
- Finally, we return the resulting DataFrame containing the names of customers who never placed any orders, labeled as ‘Customers’.
import pandas as pd
def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
# Merge the customers DataFrame with the orders DataFrame using a left join on 'id' and 'customerId'
merged_df = customers.merge(orders, how='left', left_on='id', right_on='customerId')
# Use the 'customerId' column to create a boolean mask for customers who never placed any orders
mask = merged_df['customerId'].isna()
# Filter the rows using the boolean mask
result_df = merged_df[mask]
# Select only the 'name' column from the result DataFrame and rename it as 'Customers'
result_df = result_df[['name']].rename(columns={'name': 'Customers'})
return result_df