Problem

Table: Users

1
2
3
4
5
6
7
8
9
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| user_id | int |
| join_date | date |
| favorite_brand | varchar |
+----------------+---------+
user_id is the primary key (column with unique values) of this table.
This table has the info of the users of an online shopping website where users can sell and buy items.

Table: Orders

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| order_id | int |
| order_date | date |
| item_id | int |
| buyer_id | int |
| seller_id | int |
+---------------+---------+
order_id is the primary key (column with unique values) of this table.
item_id is a foreign key (reference column) to the Items table.
buyer_id and seller_id are foreign keys to the Users table.

Table: Items

1
2
3
4
5
6
7
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| item_id | int |
| item_brand | varchar |
+---------------+---------+
item_id is the primary key (column with unique values) of this table.

Write a solution to find for each user, the join date and the number of orders they made as a buyer in 2019.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Input: 
Users table:
+---------+------------+----------------+
| user_id | join_date  | favorite_brand |
+---------+------------+----------------+
| 1       | 2018-01-01 | Lenovo         |
| 2       | 2018-02-09 | Samsung        |
| 3       | 2018-01-19 | LG             |
| 4       | 2018-05-21 | HP             |
+---------+------------+----------------+
Orders table:
+----------+------------+---------+----------+-----------+
| order_id | order_date | item_id | buyer_id | seller_id |
+----------+------------+---------+----------+-----------+
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2018-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2018-08-04 | 1       | 4        | 2         |
| 5        | 2018-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |
+----------+------------+---------+----------+-----------+
Items table:
+---------+------------+
| item_id | item_brand |
+---------+------------+
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |
+---------+------------+
Output: 
+-----------+------------+----------------+
| buyer_id  | join_date  | orders_in_2019 |
+-----------+------------+----------------+
| 1         | 2018-01-01 | 1              |
| 2         | 2018-02-09 | 2              |
| 3         | 2018-01-19 | 0              |
| 4         | 2018-05-21 | 0              |
+-----------+------------+----------------+

Solution

Method 1 – SQL Aggregation with LEFT JOIN

Intuition

We need to count the number of orders each user made as a buyer in 2019. We use a LEFT JOIN to include users with zero orders, and filter orders by year.

Approach

  1. LEFT JOIN the Users table with Orders on user_id = buyer_id.
  2. Filter orders to only those in 2019 using YEAR(order_date) = 2019.
  3. Group by user_id and join_date.
  4. Count the number of orders for each user (counting NULL as 0).

Code

1
2
3
4
5
SELECT u.user_id, u.join_date, COUNT(o.order_id) AS orders_in_2019
FROM Users u
LEFT JOIN Orders o
  ON u.user_id = o.buyer_id AND YEAR(o.order_date) = 2019
GROUP BY u.user_id, u.join_date;
1
2
3
4
5
SELECT u.user_id, u.join_date, COUNT(o.order_id) AS orders_in_2019
FROM Users u
LEFT JOIN Orders o
  ON u.user_id = o.buyer_id AND EXTRACT(YEAR FROM o.order_date) = 2019
GROUP BY u.user_id, u.join_date;
1
2
3
4
5
6
7
class Solution:
    def market_analysis(self, users: 'pd.DataFrame', orders: 'pd.DataFrame') -> 'pd.DataFrame':
        orders_2019 = orders[orders['order_date'].str.startswith('2019')]
        cnt = orders_2019.groupby('buyer_id').size().reset_index(name='orders_in_2019')
        res = users.merge(cnt, left_on='user_id', right_on='buyer_id', how='left').fillna({'orders_in_2019': 0})
        res['orders_in_2019'] = res['orders_in_2019'].astype(int)
        return res[['user_id', 'join_date', 'orders_in_2019']]

Complexity

  • ⏰ Time complexity: O(n + m), where n is the number of users and m is the number of orders, as we scan both tables once.
  • 🧺 Space complexity: O(n), for storing the result for each user.