Problem

Table: Users

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| seller_id      | int     |
| join_date      | date    |
| favorite_brand | varchar |
+----------------+---------+
seller_id is column of unique values for this table.
This table contains seller id, join date, and favorite brand of sellers.

Table: Items

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| item_id       | int     |
| item_brand    | varchar |
+---------------+---------+
item_id is the column of unique values for this table.
This table contains item id and item brand.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| item_id       | int     |
| seller_id     | int     |
+---------------+---------+
order_id is the column of unique values for this table.
item_id is a foreign key to the Items table.
seller_id is a foreign key to the Users table.
This table contains order id, order date, item id and seller id.

Write a solution to find the top seller who has sold the highest number ofunique items with a different brand than their favorite brand. If there are multiple sellers with the same highest count, return all of them.

Return the result table ordered by seller_id inascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Input: 
Users table:
+-----------+------------+----------------+
| seller_id | join_date  | favorite_brand |
+-----------+------------+----------------+
| 1         | 2019-01-01 | Lenovo         |
| 2         | 2019-02-09 | Samsung        |
| 3         | 2019-01-19 | LG             |
+-----------+------------+----------------+
Orders table:
+----------+------------+---------+-----------+
| order_id | order_date | item_id | seller_id |
+----------+------------+---------+-----------+
| 1        | 2019-08-01 | 4       | 2         |
| 2        | 2019-08-02 | 2       | 3         |
| 3        | 2019-08-03 | 3       | 3         |
| 4        | 2019-08-04 | 1       | 2         |
| 5        | 2019-08-04 | 4       | 2         |
+----------+------------+---------+-----------+
Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |
+---------+------------+
Output: 
+-----------+-----------+
| seller_id | num_items |
+-----------+-----------+
| 2         | 1         |
| 3         | 1         |
+-----------+-----------+
Explanation: 
- The user with seller_id 2 has sold three items, but only two of them are not marked as a favorite. We will include a unique count of 1 because both of these items are identical.
- The user with seller_id 3 has sold two items, but only one of them is not marked as a favorite. We will include just that non-favorite item in our count.
Since seller_ids 2 and 3 have the same count of one item each, they both will be displayed in the output.

Solution

Method 1 – SQL Group By and Join

Intuition

We need to count, for each seller, the number of unique items they sold where the item’s brand is different from their favorite brand. Then, select the seller(s) with the highest such count.

Approach

  1. Join Orders, Items, and Users to get seller_id, item_id, item_brand, and favorite_brand.
  2. Filter to only those sales where item_brand != favorite_brand.
  3. For each seller, count the number of unique item_ids sold with a non-favorite brand.
  4. Find the maximum count among all sellers.
  5. Return all sellers with this maximum count, ordered by seller_id ascending.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
WITH seller_items AS (
  SELECT o.seller_id, o.item_id
  FROM Orders o
  JOIN Items i ON o.item_id = i.item_id
  JOIN Users u ON o.seller_id = u.seller_id
  WHERE i.item_brand <> u.favorite_brand
  GROUP BY o.seller_id, o.item_id
),
counts AS (
  SELECT seller_id, COUNT(*) AS num_items
  FROM seller_items
  GROUP BY seller_id
),
max_count AS (
  SELECT MAX(num_items) AS mx FROM counts
)
SELECT seller_id, num_items
FROM counts
WHERE num_items = (SELECT mx FROM max_count)
ORDER BY seller_id ASC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
WITH seller_items AS (
  SELECT o.seller_id, o.item_id
  FROM Orders o
  JOIN Items i ON o.item_id = i.item_id
  JOIN Users u ON o.seller_id = u.seller_id
  WHERE i.item_brand <> u.favorite_brand
  GROUP BY o.seller_id, o.item_id
),
counts AS (
  SELECT seller_id, COUNT(*) AS num_items
  FROM seller_items
  GROUP BY seller_id
),
max_count AS (
  SELECT MAX(num_items) AS mx FROM counts
)
SELECT seller_id, num_items
FROM counts
WHERE num_items = (SELECT mx FROM max_count)
ORDER BY seller_id ASC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Solution:
    def market_analysis_iii(self, users: 'pd.DataFrame', orders: 'pd.DataFrame', items: 'pd.DataFrame') -> 'pd.DataFrame':
        merged = orders.merge(items, on='item_id').merge(users, on='seller_id')
        filtered = merged[merged['item_brand'] != merged['favorite_brand']]
        unique_items = filtered.drop_duplicates(['seller_id', 'item_id'])
        counts = unique_items.groupby('seller_id').size().reset_index(name='num_items')
        if counts.empty:
            return counts[['seller_id', 'num_items']]
        mx = counts['num_items'].max()
        result = counts[counts['num_items'] == mx].sort_values('seller_id').reset_index(drop=True)
        return result

Complexity

  • ⏰ Time complexity: O(n + m + k), where n is the number of users, m is the number of orders, and k is the number of items, as we scan and join all tables.
  • 🧺 Space complexity: O(s), where s is the number of sellers, for storing the result for each seller.