+--------------+------+
| Column Name | Type |
+--------------+------+
| order_id | int |
| customer_id | int |
| order_date | date |
| price | int |
+--------------+------+
order_id is the column with unique values for this table.
Each row contains the id of an order, the id of customer that ordered it, the date of the order, and its price.
Write a solution to report the IDs of the customers with the total purchases strictly increasing yearly.
The total purchases of a customer in one year is the sum of the prices of their orders in that year. If for some year the customer did not make any order, we consider the total purchases 0.
The first year to consider for each customer is the year of their first order.
The last year to consider for each customer is the year of their last order.
Input:
Orders table:+----------+-------------+------------+-------+| order_id | customer_id | order_date | price |+----------+-------------+------------+-------+|1|1|2019-07-01|1100||2|1|2019-11-01|1200||3|1|2020-05-26|3000||4|1|2021-08-31|3100||5|1|2022-12-07|4700||6|2|2015-01-01|700||7|2|2017-11-07|1000||8|3|2017-01-01|900||9|3|2018-11-07|900|+----------+-------------+------------+-------+Output:
+-------------+| customer_id |+-------------+|1|+-------------+Explanation:
Customer 1: The first year is2019 and the last year is2022-2019:1100+1200=2300-2020:3000-2021:3100-2022:4700We can see that the total purchases are strictly increasing yearly, so we include customer 1in the answer.Customer 2: The first year is2015 and the last year is2017-2015:700-2016:0-2017:1000We do not include customer 2in the answer because the total purchases are not strictly increasing. Note that customer 2 did not make any purchases in2016.Customer 3: The first year is2017, and the last year is2018-2017:900-2018:900We do not include customer 3in the answer because the total purchases are not strictly increasing.
The key idea is to aggregate each customer’s yearly purchases, fill in missing years with 0, and check if the sequence of yearly totals is strictly increasing. This can be done using window functions and self-joins in SQL, or groupby and reindex in pandas.
WITH yearly AS (
SELECT customer_id, YEAR(order_date) AS y, SUM(price) AS total
FROM Orders
GROUPBY customer_id, y
),
minmax AS (
SELECT customer_id, MIN(YEAR(order_date)) AS y1, MAX(YEAR(order_date)) AS y2
FROM Orders
GROUPBY customer_id
),
all_years AS (
SELECT m.customer_id, y1 + seq.seq AS y
FROM minmax m
JOIN (
SELECT0AS seq UNIONALLSELECT1UNIONALLSELECT2UNIONALLSELECT3UNIONALLSELECT4UNIONALLSELECT5UNIONALLSELECT6UNIONALLSELECT7UNIONALLSELECT8UNIONALLSELECT9 ) seq
WHERE y1 + seq.seq <= y2
),
filled AS (
SELECT a.customer_id, a.y, COALESCE(y.total, 0) AS total
FROM all_years a
LEFTJOIN yearly y ON a.customer_id = y.customer_id AND a.y = y.y
)
SELECT customer_id
FROM (
SELECT customer_id,
SUM(CASEWHEN total < LEAD(total) OVER (PARTITION BY customer_id ORDERBY y) THEN1ELSE0END) AS inc,
COUNT(*)-1AS n
FROM filled
GROUPBY customer_id
) t
WHERE inc = n;
WITH yearly AS (
SELECT customer_id, EXTRACT(YEARFROM order_date)::int AS y, SUM(price) AS total
FROM Orders
GROUPBY customer_id, y
),
minmax AS (
SELECT customer_id, MIN(EXTRACT(YEARFROM order_date))::int AS y1, MAX(EXTRACT(YEARFROM order_date))::int AS y2
FROM Orders
GROUPBY customer_id
),
all_years AS (
SELECT m.customer_id, y1 + seq.seq AS y
FROM minmax m
JOIN generate_series(0, 100) seq(seq)
WHERE y1 + seq.seq <= y2
),
filled AS (
SELECT a.customer_id, a.y, COALESCE(y.total, 0) AS total
FROM all_years a
LEFTJOIN yearly y ON a.customer_id = y.customer_id AND a.y = y.y
)
SELECT customer_id
FROM (
SELECT customer_id,
SUM(CASEWHEN total < LEAD(total) OVER (PARTITION BY customer_id ORDERBY y) THEN1ELSE0END) AS inc,
COUNT(*)-1AS n
FROM filled
GROUPBY customer_id
) t
WHERE inc = n;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
defcustomers_with_strictly_increasing_purchases(orders: 'pd.DataFrame') ->'pd.DataFrame':
import pandas as pd
orders = orders.copy()
orders['year'] = pd.to_datetime(orders['order_date']).dt.year
# Aggregate yearly totals yearly = orders.groupby(['customer_id', 'year'])['price'].sum().reset_index()
res = []
for cid, group in yearly.groupby('customer_id'):
y1, y2 = group['year'].min(), group['year'].max()
all_years = pd.DataFrame({'year': range(y1, y2+1)})
merged = all_years.merge(group, on='year', how='left').fillna({'price': 0})
totals = merged['price'].tolist()
if all(totals[i] < totals[i+1] for i in range(len(totals)-1)):
res.append(cid)
return pd.DataFrame({'customer_id': res})