Problem

Table: Product

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| product_id    | int     |
| product_name  | varchar |
+---------------+---------+
product_id is the primary key (column with unique values) for this table.
product_name is the name of the product.

Table: Sales

+---------------------+---------+
| Column Name         | Type    |
+---------------------+---------+
| product_id          | int     |
| period_start        | date    |
| period_end          | date    |
| average_daily_sales | int     |
+---------------------+---------+
product_id is the primary key (column with unique values) for this table.
period_start and period_end indicate the start and end date for the sales period, and both dates are inclusive.
The average_daily_sales column holds the average daily sales amount of the items for the period.
The dates of the sales years are between 2018 to 2020.

Write a solution to report the total sales amount of each item for each year, with corresponding product_name, product_id, report_year, and total_amount.

Return the result table ordered by product_id and report_year.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Input: 
Product table:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 1          | LC Phone     |
| 2          | LC T-Shirt   |
| 3          | LC Keychain  |
+------------+--------------+
Sales table:
+------------+--------------+-------------+---------------------+
| product_id | period_start | period_end  | average_daily_sales |
+------------+--------------+-------------+---------------------+
| 1          | 2019-01-25   | 2019-02-28  | 100                 |
| 2          | 2018-12-01   | 2020-01-01  | 10                  |
| 3          | 2019-12-01   | 2020-01-31  | 1                   |
+------------+--------------+-------------+---------------------+
Output: 
+------------+--------------+-------------+--------------+
| product_id | product_name | report_year | total_amount |
+------------+--------------+-------------+--------------+
| 1          | LC Phone     |    2019     | 3500         |
| 2          | LC T-Shirt   |    2018     | 310          |
| 2          | LC T-Shirt   |    2019     | 3650         |
| 2          | LC T-Shirt   |    2020     | 10           |
| 3          | LC Keychain  |    2019     | 31           |
| 3          | LC Keychain  |    2020     | 31           |
+------------+--------------+-------------+--------------+
Explanation: 
LC Phone was sold for the period of 2019-01-25 to 2019-02-28, and there are 35 days for this period. Total amount 35*100 = 3500. 
LC T-shirt was sold for the period of 2018-12-01 to 2020-01-01, and there are 31, 365, 1 days for years 2018, 2019 and 2020 respectively.
LC Keychain was sold for the period of 2019-12-01 to 2020-01-31, and there are 31, 31 days for years 2019 and 2020 respectively.

Solution

Method 1 – Expand Periods by Year

Intuition

For each sales period, split it by year, count the days in each year, and sum the total sales for each product and year.

Approach

For each row in Sales, for each year between period_start and period_end, calculate the number of days in that year that overlap with the period, then multiply by average_daily_sales. Aggregate by product and year.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
WITH years AS (
  SELECT 2018 AS y UNION ALL SELECT 2019 UNION ALL SELECT 2020
),
expanded AS (
  SELECT s.product_id, y.y AS report_year,
         GREATEST(period_start, MAKEDATE(y.y, 1)) AS start_date,
         LEAST(period_end, MAKEDATE(y.y+1, 1) - INTERVAL 1 DAY) AS end_date,
         s.average_daily_sales
    FROM Sales s
    JOIN years y
      ON period_start <= MAKEDATE(y.y+1, 1) - INTERVAL 1 DAY
     AND period_end >= MAKEDATE(y.y, 1)
)
SELECT p.product_id, p.product_name, e.report_year,
       SUM(DATEDIFF(e.end_date, e.start_date) + 1) * e.average_daily_sales AS total_amount
  FROM expanded e
  JOIN Product p ON p.product_id = e.product_id
 GROUP BY p.product_id, p.product_name, e.report_year
 ORDER BY p.product_id, e.report_year;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
WITH years AS (
  SELECT 2018 AS y UNION ALL SELECT 2019 UNION ALL SELECT 2020
),
expanded AS (
  SELECT s.product_id, y.y AS report_year,
         GREATEST(period_start, make_date(y.y, 1, 1)) AS start_date,
         LEAST(period_end, make_date(y.y+1, 1, 1) - INTERVAL '1 day') AS end_date,
         s.average_daily_sales
    FROM Sales s
    JOIN years y
      ON period_start <= make_date(y.y+1, 1, 1) - INTERVAL '1 day'
     AND period_end >= make_date(y.y, 1, 1)
)
SELECT p.product_id, p.product_name, e.report_year,
       SUM(EXTRACT(DAY FROM e.end_date - e.start_date) + 1) * e.average_daily_sales AS total_amount
  FROM expanded e
  JOIN Product p ON p.product_id = e.product_id
 GROUP BY p.product_id, p.product_name, e.report_year
 ORDER BY p.product_id, e.report_year;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
WITH years AS (
  SELECT 2018 AS y FROM dual UNION ALL SELECT 2019 FROM dual UNION ALL SELECT 2020 FROM dual
),
expanded AS (
  SELECT s.product_id, y.y AS report_year,
         GREATEST(period_start, TO_DATE(y.y || '-01-01', 'YYYY-MM-DD')) AS start_date,
         LEAST(period_end, TO_DATE((y.y+1) || '-01-01', 'YYYY-MM-DD') - 1) AS end_date,
         s.average_daily_sales
    FROM Sales s
    JOIN years y
      ON period_start <= TO_DATE((y.y+1) || '-01-01', 'YYYY-MM-DD') - 1
     AND period_end >= TO_DATE(y.y || '-01-01', 'YYYY-MM-DD')
)
SELECT p.product_id, p.product_name, e.report_year,
       SUM(e.end_date - e.start_date + 1) * e.average_daily_sales AS total_amount
  FROM expanded e
  JOIN Product p ON p.product_id = e.product_id
 GROUP BY p.product_id, p.product_name, e.report_year
 ORDER BY p.product_id, e.report_year;

Complexity

  • ⏰ Time complexity: O(ny) where n = sales rows, y = number of years (constant)
  • 🧺 Space complexity: O(ny)