Problem

Table: Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| store_name1 | int     |
| store_name2 | int     |
|      :      | int     |
|      :      | int     |
|      :      | int     |
| store_namen | int     |
+-------------+---------+
product_id is the primary key for this table.
Each row in this table indicates the product's price in n different stores.
If the product is not available in a store, the price will be null in that store's column.
The names of the stores may change from one testcase to another. There will be at least 1 store and at most 30 stores.

Important note: This problem targets those who have a good experience with SQL. If you are a beginner, we recommend that you skip it for now.

Implement the procedure UnpivotProducts to reorganize the Products table so that each row has the id of one product, the name of a store where it is sold, and its price in that store. If a product is not available in a store, do not include a row with that product_id and store combination in the result table. There should be three columns: product_id, store, and price.

The procedure should return the table after reorganizing it.

Return the result table in any order.

The query result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Input: 
Products table:
+------------+----------+--------+------+------+
| product_id | LC_Store | Nozama | Shop | Souq |
+------------+----------+--------+------+------+
| 1          | 100      | null   | 110  | null |
| 2          | null     | 200    | null | 190  |
| 3          | null     | null   | 1000 | 1900 |
+------------+----------+--------+------+------+
Output: 
+------------+----------+-------+
| product_id | store    | price |
+------------+----------+-------+
| 1          | LC_Store | 100   |
| 1          | Shop     | 110   |
| 2          | Nozama   | 200   |
| 2          | Souq     | 190   |
| 3          | Shop     | 1000  |
| 3          | Souq     | 1900  |
+------------+----------+-------+
Explanation: 
Product 1 is sold in LC_Store and Shop with prices of 100 and 110 respectively.
Product 2 is sold in Nozama and Souq with prices of 200 and 190.
Product 3 is sold in Shop and Souq with prices of 1000 and 1900.

Solution

Method 1 – Dynamic SQL Unpivot

Intuition

To unpivot a table with an unknown set of store columns, we need to dynamically generate SQL that selects each store column as a row, skipping nulls. In pandas, we can use melt and drop nulls.

Approach

  1. Get the list of store columns (all columns except product_id).
  2. For SQL, dynamically generate a UNION ALL query for each store column, selecting only non-null prices.
  3. For pandas, use melt to unpivot, then drop rows with null prices.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SET SESSION group_concat_max_len = 1000000;
SET @sql = NULL;
SELECT GROUP_CONCAT(
  CONCAT('SELECT product_id, ''', COLUMN_NAME, ''' AS store, ', COLUMN_NAME, ' AS price FROM Products WHERE ', COLUMN_NAME, ' IS NOT NULL')
  SEPARATOR ' UNION ALL '
) INTO @sql
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'Products' AND COLUMN_NAME != 'product_id';
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
DO $$
DECLARE
    col_list text;
    dyn_sql text;
BEGIN
    SELECT string_agg(
        format('SELECT product_id, %L AS store, %I AS price FROM Products WHERE %I IS NOT NULL', column_name, column_name, column_name),
        ' UNION ALL '
    ) INTO col_list
    FROM information_schema.columns
    WHERE table_name = 'products' AND column_name != 'product_id';
    dyn_sql := col_list;
    EXECUTE dyn_sql;
END $$;
1
2
3
4
5
import pandas as pd

def unpivot_products(products: pd.DataFrame) -> pd.DataFrame:
    df = products.melt(id_vars=['product_id'], var_name='store', value_name='price')
    return df.dropna(subset=['price'])

Complexity

  • ⏰ Time complexity: O(n * s), where n is the number of products and s is the number of stores.
  • 🧺 Space complexity: O(n * s), for the output table.