Problem

Table: Users

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| user_id        | int     |
| join_date      | date    |
| favorite_brand | varchar |
+----------------+---------+
user_id is the primary key (column with unique values) of this table.
This table has the info of the users of an online shopping website where users can sell and buy items.

Table: Orders

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| order_id      | int     |
| order_date    | date    |
| item_id       | int     |
| buyer_id      | int     |
| seller_id     | int     |
+---------------+---------+
order_id is the primary key (column with unique values) of this table.
item_id is a foreign key (reference column) to the Items table.
buyer_id and seller_id are foreign keys to the Users table.

Table: Items

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| item_id       | int     |
| item_brand    | varchar |
+---------------+---------+
item_id is the primary key (column with unique values) of this table.

Write a solution to find for each user whether the brand of the second item (by date) they sold is their favorite brand. If a user sold less than two items, report the answer for that user as no. It is guaranteed that no seller sells more than one item in a day.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Input: 
Users table:
+---------+------------+----------------+
| user_id | join_date  | favorite_brand |
+---------+------------+----------------+
| 1       | 2019-01-01 | Lenovo         |
| 2       | 2019-02-09 | Samsung        |
| 3       | 2019-01-19 | LG             |
| 4       | 2019-05-21 | HP             |
+---------+------------+----------------+
Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2019-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2019-08-04 | 1       | 4        | 2         |
| 5        | 2019-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |
+----------+------------+---------+----------+-----------+
Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |
+---------+------------+
Output: 
+-----------+--------------------+
| seller_id | 2nd_item_fav_brand |
+-----------+--------------------+
| 1         | no                 |
| 2         | yes                |
| 3         | yes                |
| 4         | no                 |
+-----------+--------------------+
Explanation: 
The answer for the user with id 1 is no because they sold nothing.
The answer for the users with id 2 and 3 is yes because the brands of their second sold items are their favorite brands.
The answer for the user with id 4 is no because the brand of their second sold item is not their favorite brand.

Solution

Method 1 – SQL Window Functions and Join

Intuition

To determine if the brand of the second item a user sold matches their favorite brand, we need to:

  • Find the second sold item for each seller (by date).
  • Join with the Items and Users tables to get the brand and favorite brand.
  • Compare and output ‘yes’ or ’no’.

Approach

  1. Use ROW_NUMBER() window function to rank each sale per seller by order_date.
  2. Filter to only the second sale (row_number = 2).
  3. Join with Items to get the brand, and with Users to get the favorite brand.
  4. For sellers with less than two sales, output ’no'.
  5. Output seller_id and ‘yes’/’no’ for each user.

Code

1
2
3
4
5
6
7
8
SELECT u.user_id AS seller_id,
       IF(t.item_brand = u.favorite_brand, 'yes', 'no') AS 2nd_item_fav_brand
FROM Users u
LEFT JOIN (
    SELECT seller_id, item_id, ROW_NUMBER() OVER (PARTITION BY seller_id ORDER BY order_date) AS rn
    FROM Orders
) o ON u.user_id = o.seller_id AND o.rn = 2
LEFT JOIN Items t ON o.item_id = t.item_id;
1
2
3
4
5
6
7
8
SELECT u.user_id AS seller_id,
       CASE WHEN t.item_brand = u.favorite_brand THEN 'yes' ELSE 'no' END AS 2nd_item_fav_brand
FROM Users u
LEFT JOIN (
    SELECT seller_id, item_id, ROW_NUMBER() OVER (PARTITION BY seller_id ORDER BY order_date) AS rn
    FROM Orders
) o ON u.user_id = o.seller_id AND o.rn = 2
LEFT JOIN Items t ON o.item_id = t.item_id;
1
2
3
4
5
6
7
8
9
class Solution:
    def market_analysis_ii(self, users: 'pd.DataFrame', orders: 'pd.DataFrame', items: 'pd.DataFrame') -> 'pd.DataFrame':
        orders = orders.sort_values(['seller_id', 'order_date'])
        orders['rn'] = orders.groupby('seller_id').cumcount() + 1
        second_sales = orders[orders['rn'] == 2][['seller_id', 'item_id']]
        merged = users[['user_id', 'favorite_brand']].merge(second_sales, left_on='user_id', right_on='seller_id', how='left')
        merged = merged.merge(items[['item_id', 'item_brand']], on='item_id', how='left')
        merged['2nd_item_fav_brand'] = (merged['item_brand'] == merged['favorite_brand']).map(lambda x: 'yes' if x else 'no')
        return merged[['user_id', '2nd_item_fav_brand']].rename(columns={'user_id': 'seller_id'})

Complexity

  • ⏰ Time complexity: O(n + m + k), where n is the number of users, m is the number of orders, and k is the number of items, as we scan and join all tables.
  • 🧺 Space complexity: O(n), for storing the result for each user.