Problem

Table: Product

1
2
3
4
5
6
7
8
9
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| product_id | int |
| product_name | varchar |
| unit_price | int |
+--------------+---------+
product_id is the primary key (column with unique values) of this table.
Each row of this table indicates the name and the price of each product.

Table: Sales

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| seller_id | int |
| product_id | int |
| buyer_id | int |
| sale_date | date |
| quantity | int |
| price | int |
+-------------+---------+
This table can have duplicate rows.
product_id is a foreign key (reference column) to the Product table.
Each row of this table contains some information about one sale.

Write a solution to report the products that were only sold in the first quarter of 2019. That is, between 2019-01-01 and 2019-03-31 inclusive.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Input: 
Product table:
+------------+--------------+------------+
| product_id | product_name | unit_price |
+------------+--------------+------------+
| 1          | S8           | 1000       |
| 2          | G4           | 800        |
| 3          | iPhone       | 1400       |
+------------+--------------+------------+
Sales table:
+-----------+------------+----------+------------+----------+-------+
| seller_id | product_id | buyer_id | sale_date  | quantity | price |
+-----------+------------+----------+------------+----------+-------+
| 1         | 1          | 1        | 2019-01-21 | 2        | 2000  |
| 1         | 2          | 2        | 2019-02-17 | 1        | 800   |
| 2         | 2          | 3        | 2019-06-02 | 1        | 800   |
| 3         | 3          | 4        | 2019-05-13 | 2        | 2800  |
+-----------+------------+----------+------------+----------+-------+
Output: 
+-------------+--------------+
| product_id  | product_name |
+-------------+--------------+
| 1           | S8           |
+-------------+--------------+
Explanation: 
The product with id 1 was only sold in the spring of 2019.
The product with id 2 was sold in the spring of 2019 but was also sold after the spring of 2019.
The product with id 3 was sold after spring 2019.
We return only product 1 as it is the product that was only sold in the spring of 2019.

Solution

Method 1 - Group and Filter by Date

Intuition

We want products that were sold only in Q1 2019 (2019-01-01 to 2019-03-31) and not outside this range. We can group sales by product and check the min and max sale_date for each product.

Approach

  1. For each product, check if all its sales are in Q1 2019.
  2. Select products where min(sale_date) >= ‘2019-01-01’ and max(sale_date) <= ‘2019-03-31’.
  3. Join with Product table to get product_name.

Code

1
2
3
4
5
SELECT p.product_id, p.product_name
FROM Product p
JOIN Sales s ON p.product_id = s.product_id
GROUP BY p.product_id, p.product_name
HAVING MIN(s.sale_date) >= '2019-01-01' AND MAX(s.sale_date) <= '2019-03-31';
1
2
3
4
5
SELECT p.product_id, p.product_name
FROM Product p
JOIN Sales s ON p.product_id = s.product_id
GROUP BY p.product_id, p.product_name
HAVING MIN(s.sale_date) >= '2019-01-01' AND MAX(s.sale_date) <= '2019-03-31';
1
2
3
4
5
6
7
8
9
# Product and Sales are pandas DataFrames
import pandas as pd
def sales_analysis_iii(Product, Sales):
    merged = Sales.merge(Product, on='product_id')
    grouped = merged.groupby(['product_id', 'product_name'])['sale_date']
    mask = (grouped.min() >= '2019-01-01') & (grouped.max() <= '2019-03-31')
    result = mask[mask].reset_index()[['product_id', 'product_name']]
    return result
# result = sales_analysis_iii(Product, Sales)

Complexity

  • ⏰ Time complexity: O(N) (N = number of sales)
  • 🧺 Space complexity: O(P) (P = number of products)