Problem
Table: Orders
+-----------------+----------+
| Column Name | Type |
+-----------------+----------+
| order_number | int |
| customer_number | int |
+-----------------+----------+
order_number is the primary key for this table.
This table contains information about the order ID and the customer ID.
Write an SQL query to find the customer_number for the customer who has placed the largest number of orders.
The test cases are generated so that exactly one customer will have placed more orders than any other customer.
Examples
Example 1:
Input: Orders table:
+--------------+-----------------+
| order_number | customer_number |
+--------------+-----------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 3 |
+--------------+-----------------+
Output:
+-----------------+
| customer_number |
+-----------------+
| 3 |
+-----------------+
Explanation:
The customer with number 3 has two orders, which is greater than either customer 1 or 2 because each of them only has one order.
So the result is customer_number
3.
Follow up
What if more than one customer has the largest number of orders, can you find all the customer_number
in this case?
Solution
Method 1 - Using Count, Sorting Count and Limit 1
Code
SQL
SELECT customer_number
FROM Orders
GROUP BY customer_number
ORDER BY COUNT(*) DESC
LIMIT 1;
Above query will fail for follow up. Lets try another solution.
Python
import pandas as pd
def largest_orders(orders: pd.DataFrame) -> pd.DataFrame:
return orders['customer_number'].mode().to_frame()
Method 2 - Using Subquery to Get Max and Then Filter by count=max
Code
SQL
SELECT customer_number
FROM orders
GROUP BY customer_number
HAVING COUNT(order_number) = (
SELECT COUNT(order_number) cnt
FROM orders
GROUP BY customer_number
ORDER BY cnt DESC
LIMIT 1
)
Pandas
import pandas as pd
def largest_orders(orders: pd.DataFrame) -> pd.DataFrame:
# Group by customer_number and count the number of orders for each customer
customer_order_counts = orders.groupby('customer_number')['order_number'].count().reset_index()
# Find the customer with the largest number of orders
max_orders_customer = customer_order_counts[customer_order_counts['order_number'] == customer_order_counts['order_number'].max()][['customer_number']]
return max_orders_customer