Problem

Table: OrdersDetails

+-------------+------+
| Column Name | Type |
+-------------+------+
| order_id    | int  |
| product_id  | int  |
| quantity    | int  |
+-------------+------+
(order_id, product_id) is the primary key (combination of columns with unique values) for this table.
A single order is represented as multiple rows, one row for each product in the order.
Each row of this table contains the quantity ordered of the product product_id in the order order_id.

You are running an e-commerce site that is looking for imbalanced orders. An imbalanced order is one whose maximum quantity is strictly greater than the average quantity of every order (including itself).

The average quantity of an order is calculated as (total quantity of all products in the order) / (number of different products in the order). The maximum quantity of an order is the highest quantity of any single product in the order.

Write a solution to find the order_id of all imbalanced orders.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Input: 
OrdersDetails table:
+----------+------------+----------+
| order_id | product_id | quantity |
+----------+------------+----------+
| 1        | 1          | 12       |
| 1        | 2          | 10       |
| 1        | 3          | 15       |
| 2        | 1          | 8        |
| 2        | 4          | 4        |
| 2        | 5          | 6        |
| 3        | 3          | 5        |
| 3        | 4          | 18       |
| 4        | 5          | 2        |
| 4        | 6          | 8        |
| 5        | 7          | 9        |
| 5        | 8          | 9        |
| 3        | 9          | 20       |
| 2        | 9          | 4        |
+----------+------------+----------+
Output: 
+----------+
| order_id |
+----------+
| 1        |
| 3        |
+----------+
Explanation: 
The average quantity of each order is:
- order_id=1: (12+10+15)/3 = 12.3333333
- order_id=2: (8+4+6+4)/4 = 5.5
- order_id=3: (5+18+20)/3 = 14.333333
- order_id=4: (2+8)/2 = 5
- order_id=5: (9+9)/2 = 9
The maximum quantity of each order is:
- order_id=1: max(12, 10, 15) = 15
- order_id=2: max(8, 4, 6, 4) = 8
- order_id=3: max(5, 18, 20) = 20
- order_id=4: max(2, 8) = 8
- order_id=5: max(9, 9) = 9
Orders 1 and 3 are imbalanced because they have a maximum quantity that exceeds the average quantity of every order.

Solution

Method 1 – Aggregation and Comparison

Intuition

We need to find orders whose maximum quantity is strictly greater than the average quantity of every order. This means, for each order, its max quantity must be greater than the max average among all orders. We can precompute the average for each order, then compare each order’s max quantity to the maximum of all averages.

Approach

  1. Compute the average quantity for each order (sum/number of products).
  2. Compute the maximum quantity for each order.
  3. Find the maximum average among all orders.
  4. Select orders whose maximum quantity is strictly greater than the maximum average.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
WITH avg_table AS (
  SELECT order_id, AVG(quantity) AS avg_q, MAX(quantity) AS max_q
  FROM OrdersDetails
  GROUP BY order_id
),
max_avg AS (
  SELECT MAX(avg_q) AS max_avg_q FROM avg_table
)
SELECT order_id
FROM avg_table, max_avg
WHERE max_q > max_avg_q;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
WITH avg_table AS (
  SELECT order_id, AVG(quantity) AS avg_q, MAX(quantity) AS max_q
  FROM OrdersDetails
  GROUP BY order_id
),
max_avg AS (
  SELECT MAX(avg_q) AS max_avg_q FROM avg_table
)
SELECT order_id
FROM avg_table, max_avg
WHERE max_q > max_avg_q;
1
2
3
4
5
6
7
class Solution:
    def imbalanced_orders(self, orders: 'pd.DataFrame') -> 'pd.DataFrame':
        avg = orders.groupby('order_id')['quantity'].mean()
        mx = orders.groupby('order_id')['quantity'].max()
        max_avg = avg.max()
        ans = mx[mx > max_avg].index.to_list()
        return pd.DataFrame({'order_id': ans})

Complexity

  • ⏰ Time complexity: O(N), where N is the number of rows in OrdersDetails. Each aggregation and comparison is linear in the number of rows.
  • 🧺 Space complexity: O(M), where M is the number of unique orders, for storing averages and maximums.