Problem#
Table: user_transactions
1
2
3
4
5
6
7
8
9
10
+ ------------------+----------+
| Column Name | Type |
+ ------------------+----------+
| transaction_id | integer |
| product_id | integer |
| spend | decimal |
| transaction_date | datetime |
+ ------------------+----------+
The transaction_id column uniquely identifies each row in this table .
Each row of this table contains the transaction ID, product ID, the spend amount, and the transaction date.
Write a solution to calculate the year-on-year growth rate for the total spend for each product .
The result table should include the following columns:
year
: The year of the transaction.
product_id
: The ID of the product.
curr_year_spend
: The total spend for the current year.
prev_year_spend
: The total spend for the previous year.
yoy_rate
: The year-on-year growth rate percentage, rounded to 2
decimal places.
Return the result table ordered by product_id
,year
inascending order .
The result format is in the following example.
Examples#
Example 1:#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Input:
`user_transactions` table:
+----------------+------------+---------+---------------------+
| transaction_id | product_id | spend | transaction_date |
+----------------+------------+---------+---------------------+
| 1341 | 123424 | 1500.60 | 2019 - 12 - 31 12 : 00 : 00 |
| 1423 | 123424 | 1000.20 | 2020 - 12 - 31 12 : 00 : 00 |
| 1623 | 123424 | 1246.44 | 2021 - 12 - 31 12 : 00 : 00 |
| 1322 | 123424 | 2145.32 | 2022 - 12 - 31 12 : 00 : 00 |
+----------------+------------+---------+---------------------+
Output:
+------+------------+----------------+----------------+----------+
| year | product_id | curr_year_spend| prev_year_spend| yoy_rate |
+------+------------+----------------+----------------+----------+
| 2019 | 123424 | 1500.60 | NULL | NULL |
| 2020 | 123424 | 1000.20 | 1500.60 | - 33.35 |
| 2021 | 123424 | 1246.44 | 1000.20 | 24.62 |
| 2022 | 123424 | 2145.32 | 1246.44 | 72.12 |
+------+------------+----------------+----------------+----------+
Explanation:
* For product ID 123424 :
* In 2019 :
* Current year' s spend is 1500.60
* No previous year' s spend recorded
* YoY growth rate: NULL
* In 2020 :
* Current year' s spend is 1000.20
* Previous year' s spend is 1500.60
* YoY growth rate: (( 1000.20 - 1500.60 ) / 1500.60 ) * 100 = - 33.35 %
* In 2021 :
* Current year' s spend is 1246.44
* Previous year' s spend is 1000.20
* YoY growth rate: (( 1246.44 - 1000.20 ) / 1000.20 ) * 100 = 24.62 %
* In 2022 :
* Current year' s spend is 2145.32
* Previous year' s spend is 1246.44
* YoY growth rate: (( 2145.32 - 1246.44 ) / 1246.44 ) * 100 = 72.12 %
** Note:** Output table is ordered by `product_id` and `year` in ascending
order.
Solution#
Method 1 – Window Functions (SQL) and Pandas#
Intuition#
We need to compute the total spend for each product and year, then compare each year’s spend to the previous year’s spend for the same product. This is a classic use case for window functions (SQL) or groupby/shift (Pandas).
Approach#
Extract the year from the transaction date.
Group by product and year, summing the spend for each group.
For each (product, year), get the previous year’s spend using a window function (SQL) or shift (Pandas).
Calculate the year-on-year growth rate as ((curr_year_spend - prev_year_spend) / prev_year_spend) * 100, rounded to 2 decimals.
Order the result by product_id and year ascending.
Code#
Sql
Sql
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT
year , product_id,
curr_year_spend,
prev_year_spend,
ROUND((curr_year_spend - prev_year_spend) / prev_year_spend * 100 , 2 ) AS yoy_rate
FROM (
SELECT
YEAR (transaction_date) AS year ,
product_id,
SUM (spend) AS curr_year_spend,
LAG(SUM (spend)) OVER (PARTITION BY product_id ORDER BY YEAR (transaction_date)) AS prev_year_spend
FROM user_transactions
GROUP BY product_id, YEAR (transaction_date)
) t
ORDER BY product_id, year ;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT
year , product_id,
curr_year_spend,
prev_year_spend,
ROUND((curr_year_spend - prev_year_spend) / prev_year_spend * 100 .0 , 2 ) AS yoy_rate
FROM (
SELECT
EXTRACT (YEAR FROM transaction_date) AS year ,
product_id,
SUM (spend) AS curr_year_spend,
LAG(SUM (spend)) OVER (PARTITION BY product_id ORDER BY EXTRACT (YEAR FROM transaction_date)) AS prev_year_spend
FROM user_transactions
GROUP BY product_id, EXTRACT (YEAR FROM transaction_date)
) t
ORDER BY product_id, year ;
1
2
3
4
5
6
7
8
9
import pandas as pd
def year_on_year_growth_rate (user_transactions: pd. DataFrame) -> pd. DataFrame:
user_transactions['year' ] = pd. to_datetime(user_transactions['transaction_date' ]). dt. year
df = user_transactions. groupby(['product_id' , 'year' ], as_index= False )['spend' ]. sum()
df = df. sort_values(['product_id' , 'year' ])
df['prev_year_spend' ] = df. groupby('product_id' )['spend' ]. shift(1 )
df['yoy_rate' ] = ((df['spend' ] - df['prev_year_spend' ]) / df['prev_year_spend' ] * 100 ). round(2 )
df. rename(columns= {'spend' : 'curr_year_spend' }, inplace= True )
return df[['year' , 'product_id' , 'curr_year_spend' , 'prev_year_spend' , 'yoy_rate' ]]
Complexity#
⏰ Time complexity: O(n log n)
— Grouping and sorting by product and year, where n is the number of transactions.
🧺 Space complexity: O(n)
— For storing the grouped and intermediate results.