Problem

Table: Customer

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| customer_id   | int     |
| customer_name | varchar |
+---------------+---------+
customer_id is the column with unique values for this table.
Each row of this table contains the information of each customer in the WebStore.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| sale_date     | date    |
| order_cost    | int     |
| customer_id   | int     |
| seller_id     | int     |
+---------------+---------+
order_id is the column with unique values for this table.
Each row of this table contains all orders made in the webstore.
sale_date is the date when the transaction was made between the customer (customer_id) and the seller (seller_id).

Table: Seller

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| seller_id     | int     |
| seller_name   | varchar |
+---------------+---------+
seller_id is the column with unique values for this table.
Each row of this table contains the information of each seller.

Write a solution to report the names of all sellers who did not make any sales in 2020.

Return the result table ordered by seller_name in ascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Input: 
Customer table:
+--------------+---------------+
| customer_id  | customer_name |
+--------------+---------------+
| 101          | Alice         |
| 102          | Bob           |
| 103          | Charlie       |
+--------------+---------------+
Orders table:
+-------------+------------+--------------+-------------+-------------+
| order_id    | sale_date  | order_cost   | customer_id | seller_id   |
+-------------+------------+--------------+-------------+-------------+
| 1           | 2020-03-01 | 1500         | 101         | 1           |
| 2           | 2020-05-25 | 2400         | 102         | 2           |
| 3           | 2019-05-25 | 800          | 101         | 3           |
| 4           | 2020-09-13 | 1000         | 103         | 2           |
| 5           | 2019-02-11 | 700          | 101         | 2           |
+-------------+------------+--------------+-------------+-------------+
Seller table:
+-------------+-------------+
| seller_id   | seller_name |
+-------------+-------------+
| 1           | Daniel      |
| 2           | Elizabeth   |
| 3           | Frank       |
+-------------+-------------+
Output: 
+-------------+
| seller_name |
+-------------+
| Frank       |
+-------------+
Explanation: 
Daniel made 1 sale in March 2020.
Elizabeth made 2 sales in 2020 and 1 sale in 2019.
Frank made 1 sale in 2019 but no sales in 2020.

Solution

Method 1 - SQL Anti-Join and Pandas Filtering

Intuition

We want to find all sellers who did not make any sales in 2020. This means we need to find sellers whose seller_id does not appear in any order with a sale_date in 2020. This is a classic anti-join problem.

Approach

  1. Find all sellers who made at least one sale in 2020 (from the Orders table).
  2. Select all sellers from the Seller table whose seller_id is not in the set of sellers with 2020 sales.
  3. Order the result by seller_name ascending.

Code

1
2
3
4
5
6
7
8
SELECT seller_name
FROM Seller
WHERE seller_id NOT IN (
    SELECT DISTINCT seller_id
    FROM Orders
    WHERE YEAR(sale_date) = 2020
)
ORDER BY seller_name ASC;
1
2
3
4
5
6
7
8
SELECT seller_name
FROM Seller
WHERE seller_id NOT IN (
    SELECT DISTINCT seller_id
    FROM Orders
    WHERE EXTRACT(YEAR FROM sale_date) = 2020
)
ORDER BY seller_name ASC;
1
2
3
4
5
6
7
# Assume Seller and Orders are pandas DataFrames
import pandas as pd

# sellers with at least one sale in 2020
sellers_2020 = Orders[Orders['sale_date'].str.startswith('2020')]['seller_id'].unique()
result = Seller[~Seller['seller_id'].isin(sellers_2020)].copy()
result = result[['seller_name']].sort_values('seller_name').reset_index(drop=True)

Complexity

  • ⏰ Time complexity: O(N + M) where N = number of sellers, M = number of orders
  • 🧺 Space complexity: O(N) for storing the result