Problem

Table: user_transactions

+------------------+----------+
| Column Name      | Type     |
+------------------+----------+
| transaction_id   | integer  |
| product_id       | integer  |
| spend            | decimal  |
| transaction_date | datetime |
+------------------+----------+
The transaction_id column uniquely identifies each row in this table.
Each row of this table contains the transaction ID, product ID, the spend amount, and the transaction date.

Write a solution to calculate the year-on-year growth rate for the total spend for each product.

The result table should include the following columns:

  • year: The year of the transaction.
  • product_id: The ID of the product.
  • curr_year_spend: The total spend for the current year.
  • prev_year_spend: The total spend for the previous year.
  • yoy_rate: The year-on-year growth rate percentage, rounded to 2 decimal places.

Return the result table ordered by product_id,year inascending order.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Input:
`user_transactions` table:
+----------------+------------+---------+---------------------+
| transaction_id | product_id | spend   | transaction_date    |
+----------------+------------+---------+---------------------+
| 1341           | 123424     | 1500.60 | 2019-12-31 12:00:00 |
| 1423           | 123424     | 1000.20 | 2020-12-31 12:00:00 |
| 1623           | 123424     | 1246.44 | 2021-12-31 12:00:00 |
| 1322           | 123424     | 2145.32 | 2022-12-31 12:00:00 |
+----------------+------------+---------+---------------------+
Output:
+------+------------+----------------+----------------+----------+
| year | product_id | curr_year_spend| prev_year_spend| yoy_rate |
+------+------------+----------------+----------------+----------+
| 2019 | 123424     | 1500.60        | NULL           | NULL     |
| 2020 | 123424     | 1000.20        | 1500.60        | -33.35   |
| 2021 | 123424     | 1246.44        | 1000.20        | 24.62    |
| 2022 | 123424     | 2145.32        | 1246.44        | 72.12    |
+------+------------+----------------+----------------+----------+
Explanation:
* For product ID 123424: 
* In 2019: 
* Current year's spend is 1500.60
* No previous year's spend recorded
* YoY growth rate: NULL
* In 2020: 
* Current year's spend is 1000.20
* Previous year's spend is 1500.60
* YoY growth rate: ((1000.20 - 1500.60) / 1500.60) * 100 = -33.35%
* In 2021: 
* Current year's spend is 1246.44
* Previous year's spend is 1000.20
* YoY growth rate: ((1246.44 - 1000.20) / 1000.20) * 100 = 24.62%
* In 2022: 
* Current year's spend is 2145.32
* Previous year's spend is 1246.44
* YoY growth rate: ((2145.32 - 1246.44) / 1246.44) * 100 = 72.12%
**Note:** Output table is ordered by `product_id` and `year` in ascending
order.

Solution

Method 1 – Window Functions (SQL) and Pandas

Intuition

We need to compute the total spend for each product and year, then compare each year’s spend to the previous year’s spend for the same product. This is a classic use case for window functions (SQL) or groupby/shift (Pandas).

Approach

  1. Extract the year from the transaction date.
  2. Group by product and year, summing the spend for each group.
  3. For each (product, year), get the previous year’s spend using a window function (SQL) or shift (Pandas).
  4. Calculate the year-on-year growth rate as ((curr_year_spend - prev_year_spend) / prev_year_spend) * 100, rounded to 2 decimals.
  5. Order the result by product_id and year ascending.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
SELECT
  year, product_id,
  curr_year_spend,
  prev_year_spend,
  ROUND((curr_year_spend - prev_year_spend) / prev_year_spend * 100, 2) AS yoy_rate
FROM (
  SELECT
    YEAR(transaction_date) AS year,
    product_id,
    SUM(spend) AS curr_year_spend,
    LAG(SUM(spend)) OVER (PARTITION BY product_id ORDER BY YEAR(transaction_date)) AS prev_year_spend
  FROM user_transactions
  GROUP BY product_id, YEAR(transaction_date)
) t
ORDER BY product_id, year;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
SELECT
  year, product_id,
  curr_year_spend,
  prev_year_spend,
  ROUND((curr_year_spend - prev_year_spend) / prev_year_spend * 100.0, 2) AS yoy_rate
FROM (
  SELECT
    EXTRACT(YEAR FROM transaction_date) AS year,
    product_id,
    SUM(spend) AS curr_year_spend,
    LAG(SUM(spend)) OVER (PARTITION BY product_id ORDER BY EXTRACT(YEAR FROM transaction_date)) AS prev_year_spend
  FROM user_transactions
  GROUP BY product_id, EXTRACT(YEAR FROM transaction_date)
) t
ORDER BY product_id, year;
1
2
3
4
5
6
7
8
9
import pandas as pd
def year_on_year_growth_rate(user_transactions: pd.DataFrame) -> pd.DataFrame:
    user_transactions['year'] = pd.to_datetime(user_transactions['transaction_date']).dt.year
    df = user_transactions.groupby(['product_id', 'year'], as_index=False)['spend'].sum()
    df = df.sort_values(['product_id', 'year'])
    df['prev_year_spend'] = df.groupby('product_id')['spend'].shift(1)
    df['yoy_rate'] = ((df['spend'] - df['prev_year_spend']) / df['prev_year_spend'] * 100).round(2)
    df.rename(columns={'spend': 'curr_year_spend'}, inplace=True)
    return df[['year', 'product_id', 'curr_year_spend', 'prev_year_spend', 'yoy_rate']]

Complexity

  • ⏰ Time complexity: O(n log n) — Grouping and sorting by product and year, where n is the number of transactions.
  • 🧺 Space complexity: O(n) — For storing the grouped and intermediate results.