Problem

Table Activities:

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| sell_date   | date    |
| product     | varchar |
+-------------+---------+

There is no primary key for this table, it may contain duplicates. Each row of this table contains the product name and the date it was sold in a market.

Write an SQL query to find for each date the number of different products sold and their names.

The sold products names for each date should be sorted lexicographically.

Return the result table ordered by sell_date.

Solution

Method 1 - Group Concat

Code

SQL
SELECT sell_date, 
       COUNT(DISTINCT product) AS num_sold,
       GROUP_CONCAT(DISTINCT product ORDER BY product) AS products
FROM Activities
GROUP BY sell_date;
Pandas
  • Group the activities by sell_date and collect the unique products for each date
  • Rename the columns for clarity
  • Replace variations of ‘Mask’ with just ‘Mask’
  • Sort the result table by sell_date
import pandas as pd

def categorize_products(activities: pd.DataFrame) -> pd.DataFrame:
    grouped = activities.groupby('sell_date')['product'].agg(['nunique', lambda x: ','.join(sorted(set(x)))]).reset_index()
    
    grouped.columns = ['sell_date', 'num_sold', 'products']
    grouped['products'] = grouped['products'].str.replace(r'(^|,)Mask(,|$)', r'\1Mask\2')
    
    result = grouped.sort_values(by='sell_date')
    
    return result