Problem

Table: Orders

+--------------+------+
| Column Name  | Type |
+--------------+------+
| order_id     | int  |
| customer_id  | int  |
| order_date   | date |
| price        | int  |
+--------------+------+
order_id is the column with unique values for this table.
Each row contains the id of an order, the id of customer that ordered it, the date of the order, and its price.

Write a solution to report the IDs of the customers with the total purchases strictly increasing yearly.

  • The total purchases of a customer in one year is the sum of the prices of their orders in that year. If for some year the customer did not make any order, we consider the total purchases 0.
  • The first year to consider for each customer is the year of their first order.
  • The last year to consider for each customer is the year of their last order.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Input: 
Orders table:
+----------+-------------+------------+-------+
| order_id | customer_id | order_date | price |
+----------+-------------+------------+-------+
| 1        | 1           | 2019-07-01 | 1100  |
| 2        | 1           | 2019-11-01 | 1200  |
| 3        | 1           | 2020-05-26 | 3000  |
| 4        | 1           | 2021-08-31 | 3100  |
| 5        | 1           | 2022-12-07 | 4700  |
| 6        | 2           | 2015-01-01 | 700   |
| 7        | 2           | 2017-11-07 | 1000  |
| 8        | 3           | 2017-01-01 | 900   |
| 9        | 3           | 2018-11-07 | 900   |
+----------+-------------+------------+-------+
Output: 
+-------------+
| customer_id |
+-------------+
| 1           |
+-------------+
Explanation: 
Customer 1: The first year is 2019 and the last year is 2022
- 2019: 1100 + 1200 = 2300
- 2020: 3000
- 2021: 3100
- 2022: 4700
We can see that the total purchases are strictly increasing yearly, so we include customer 1 in the answer.
Customer 2: The first year is 2015 and the last year is 2017
- 2015: 700
- 2016: 0
- 2017: 1000
We do not include customer 2 in the answer because the total purchases are not strictly increasing. Note that customer 2 did not make any purchases in 2016.
Customer 3: The first year is 2017, and the last year is 2018
- 2017: 900
- 2018: 900
We do not include customer 3 in the answer because the total purchases are not strictly increasing.

Solution

Method 1 – SQL Window Functions and Yearly Aggregation

Intuition

The key idea is to aggregate each customer’s yearly purchases, fill in missing years with 0, and check if the sequence of yearly totals is strictly increasing. This can be done using window functions and self-joins in SQL, or groupby and reindex in pandas.

Approach

  1. For each customer, extract the year from order_date and sum price per year.
  2. Find the first and last order year for each customer.
  3. Generate all years in the range for each customer, filling missing years with 0 purchases.
  4. For each customer, check if the sequence of yearly totals is strictly increasing.
  5. Return the customer IDs that satisfy the condition.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
WITH yearly AS (
  SELECT customer_id, YEAR(order_date) AS y, SUM(price) AS total
  FROM Orders
  GROUP BY customer_id, y
),
minmax AS (
  SELECT customer_id, MIN(YEAR(order_date)) AS y1, MAX(YEAR(order_date)) AS y2
  FROM Orders
  GROUP BY customer_id
),
all_years AS (
  SELECT m.customer_id, y1 + seq.seq AS y
  FROM minmax m
  JOIN (
    SELECT 0 AS seq UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
  ) seq
  WHERE y1 + seq.seq <= y2
),
filled AS (
  SELECT a.customer_id, a.y, COALESCE(y.total, 0) AS total
  FROM all_years a
  LEFT JOIN yearly y ON a.customer_id = y.customer_id AND a.y = y.y
)
SELECT customer_id
FROM (
  SELECT customer_id,
         SUM(CASE WHEN total < LEAD(total) OVER (PARTITION BY customer_id ORDER BY y) THEN 1 ELSE 0 END) AS inc,
         COUNT(*)-1 AS n
  FROM filled
  GROUP BY customer_id
) t
WHERE inc = n;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
WITH yearly AS (
  SELECT customer_id, EXTRACT(YEAR FROM order_date)::int AS y, SUM(price) AS total
  FROM Orders
  GROUP BY customer_id, y
),
minmax AS (
  SELECT customer_id, MIN(EXTRACT(YEAR FROM order_date))::int AS y1, MAX(EXTRACT(YEAR FROM order_date))::int AS y2
  FROM Orders
  GROUP BY customer_id
),
all_years AS (
  SELECT m.customer_id, y1 + seq.seq AS y
  FROM minmax m
  JOIN generate_series(0, 100) seq(seq)
  WHERE y1 + seq.seq <= y2
),
filled AS (
  SELECT a.customer_id, a.y, COALESCE(y.total, 0) AS total
  FROM all_years a
  LEFT JOIN yearly y ON a.customer_id = y.customer_id AND a.y = y.y
)
SELECT customer_id
FROM (
  SELECT customer_id,
         SUM(CASE WHEN total < LEAD(total) OVER (PARTITION BY customer_id ORDER BY y) THEN 1 ELSE 0 END) AS inc,
         COUNT(*)-1 AS n
  FROM filled
  GROUP BY customer_id
) t
WHERE inc = n;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def customers_with_strictly_increasing_purchases(orders: 'pd.DataFrame') -> 'pd.DataFrame':
    import pandas as pd
    orders = orders.copy()
    orders['year'] = pd.to_datetime(orders['order_date']).dt.year
    # Aggregate yearly totals
    yearly = orders.groupby(['customer_id', 'year'])['price'].sum().reset_index()
    res = []
    for cid, group in yearly.groupby('customer_id'):
        y1, y2 = group['year'].min(), group['year'].max()
        all_years = pd.DataFrame({'year': range(y1, y2+1)})
        merged = all_years.merge(group, on='year', how='left').fillna({'price': 0})
        totals = merged['price'].tolist()
        if all(totals[i] < totals[i+1] for i in range(len(totals)-1)):
            res.append(cid)
    return pd.DataFrame({'customer_id': res})

Complexity

  • ⏰ Time complexity: O(N + Y), where N is the number of orders and Y is the number of years per customer (usually small).
  • 🧺 Space complexity: O(C + Y), where C is the number of customers and Y is the number of years per customer.