Problem

Table: Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| store       | varchar |
| price       | int     |
+-------------+---------+
(product_id, store) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates the price of product_id in store.
There will be at most 30 different stores in the table.
price is the price of the product at this store.

Important note: This problem targets those who have a good experience with SQL. If you are a beginner, we recommend that you skip it for now.

Implement the procedure PivotProducts to reorganize the Products table so that each row has the id of one product and its price in each store. The price should be null if the product is not sold in a store. The columns of the table should contain each store and they should be sorted in lexicographical order.

The procedure should return the table after reorganizing it.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Input: 
Products table:
+------------+----------+-------+
| product_id | store    | price |
+------------+----------+-------+
| 1          | Shop     | 110   |
| 1          | LC_Store | 100   |
| 2          | Nozama   | 200   |
| 2          | Souq     | 190   |
| 3          | Shop     | 1000  |
| 3          | Souq     | 1900  |
+------------+----------+-------+
Output: 
+------------+----------+--------+------+------+
| product_id | LC_Store | Nozama | Shop | Souq |
+------------+----------+--------+------+------+
| 1          | 100      | null   | 110  | null |
| 2          | null     | 200    | null | 190  |
| 3          | null     | null   | 1000 | 1900 |
+------------+----------+--------+------+------+
Explanation: 
We have 4 stores: Shop, LC_Store, Nozama, and Souq. We first order them lexicographically to be: LC_Store, Nozama, Shop, and Souq.
Now, for product 1, the price in LC_Store is 100 and in Shop is 110. For the other two stores, the product is not sold so we set the price as null.
Similarly, product 2 has a price of 200 in Nozama and 190 in Souq. It is not sold in the other two stores.
For product 3, the price is 1000 in Shop and 1900 in Souq. It is not sold in the other two stores.

Solution

Method 1 – Dynamic SQL Pivot

Intuition

To pivot the Products table dynamically (with an unknown number of stores), we need to generate the column list and the pivot query at runtime. This is typically done using dynamic SQL in SQL Server, MySQL, or PostgreSQL. The idea is to aggregate prices for each product_id, with each store as a column, and prices as values, filling null where not available.

Approach

  1. Get the list of unique stores, sorted lexicographically.
  2. Build a dynamic SQL query that selects product_id and, for each store, uses an aggregate function (e.g., MAX(CASE WHEN store = ‘store_name’ THEN price END)) as a column.
  3. Group by product_id.
  4. Execute the dynamic SQL and return the result.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SET SESSION group_concat_max_len = 1000000;
SET @sql = NULL;
SELECT GROUP_CONCAT(DISTINCT CONCAT(
  'MAX(CASE WHEN store = ''', store, ''' THEN price END) AS `', store, '`')
  ORDER BY store
) INTO @cols
FROM Products;
SET @sql = CONCAT('SELECT product_id, ', @cols, ' FROM Products GROUP BY product_id');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
DO $$
DECLARE
    col_list text;
    dyn_sql text;
BEGIN
    SELECT string_agg(
        format('MAX(CASE WHEN store = %L THEN price END) AS "%s"', store, store), ', '
        ORDER BY store
    ) INTO col_list
    FROM (SELECT DISTINCT store FROM Products) s;
    dyn_sql := format('SELECT product_id, %s FROM Products GROUP BY product_id', col_list);
    EXECUTE dyn_sql;
END $$;
1
2
3
4
5
6
7
import pandas as pd

def pivot_products(products: pd.DataFrame) -> pd.DataFrame:
    df = products.pivot(index='product_id', columns='store', values='price')
    df = df.reset_index()
    df = df.reindex(columns=['product_id'] + sorted([c for c in df.columns if c != 'product_id']))
    return df

Complexity

  • ⏰ Time complexity: O(n * s), where n is the number of products and s is the number of stores.
  • 🧺 Space complexity: O(n * s), for the output table.