Problem

Table: Experiments

+-----------------+------+
| Column Name     | Type |
+-----------------+------+
| experiment_id   | int  |
| platform        | enum |
| experiment_name | enum |
+-----------------+------+
experiment_id is the column with unique values for this table.
platform is an enum (category) type of values ('Android', 'IOS', 'Web').
experiment_name is an enum (category) type of values ('Reading', 'Sports', 'Programming').
This table contains information about the ID of an experiment done with a random person, the platform used to do the experiment, and the name of the experiment.

Write a solution to report the number of experiments done on each of the three platforms for each of the three given experiments. Notice that all the pairs of (platform, experiment) should be included in the output including the pairs with zero experiments.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Input:
Experiments table:
+---------------+----------+-----------------+
| experiment_id | platform | experiment_name |
+---------------+----------+-----------------+
| 4             | IOS      | Programming     |
| 13            | IOS      | Sports          |
| 14            | Android  | Reading         |
| 8             | Web      | Reading         |
| 12            | Web      | Reading         |
| 18            | Web      | Programming     |
+---------------+----------+-----------------+
Output: 
+----------+-----------------+-----------------+
| platform | experiment_name | num_experiments |
+----------+-----------------+-----------------+
| Android  | Reading         | 1               |
| Android  | Sports          | 0               |
| Android  | Programming     | 0               |
| IOS      | Reading         | 0               |
| IOS      | Sports          | 1               |
| IOS      | Programming     | 1               |
| Web      | Reading         | 2               |
| Web      | Sports          | 0               |
| Web      | Programming     | 1               |
+----------+-----------------+-----------------+
Explanation: 
On the platform "Android", we had only one "Reading" experiment.
On the platform "IOS", we had one "Sports" experiment and one "Programming" experiment.
On the platform "Web", we had two "Reading" experiments and one "Programming" experiment.

Solution

Method 1 – Cross Join and Group By (SQL, Pandas)

Intuition

We need to report the count of experiments for every (platform, experiment_name) pair, including those with zero experiments. This is a classic case for a cross join (to generate all possible pairs) and a left join/group by to count the actual experiments.

Approach

  1. Generate all possible pairs of (platform, experiment_name) using a cross join.
  2. Left join this with the Experiments table to count the number of experiments for each pair.
  3. Group by platform and experiment_name, and count the number of experiments (use 0 for missing pairs).

Code

1
2
3
4
5
6
SELECT p.platform, e.experiment_name, COUNT(experiment_id) AS num_experiments
FROM (SELECT 'Android' AS platform UNION ALL SELECT 'IOS' UNION ALL SELECT 'Web') p
CROSS JOIN (SELECT 'Reading' AS experiment_name UNION ALL SELECT 'Sports' UNION ALL SELECT 'Programming') e
LEFT JOIN Experiments x
  ON x.platform = p.platform AND x.experiment_name = e.experiment_name
GROUP BY p.platform, e.experiment_name;
1
2
3
4
5
6
SELECT p.platform, e.experiment_name, COUNT(x.experiment_id) AS num_experiments
FROM (VALUES ('Android'), ('IOS'), ('Web')) AS p(platform)
CROSS JOIN (VALUES ('Reading'), ('Sports'), ('Programming')) AS e(experiment_name)
LEFT JOIN Experiments x
  ON x.platform = p.platform AND x.experiment_name = e.experiment_name
GROUP BY p.platform, e.experiment_name;
1
2
3
4
5
6
7
8
9
import pandas as pd

def count_experiments(experiments: pd.DataFrame) -> pd.DataFrame:
    platforms = ['Android', 'IOS', 'Web']
    experiment_names = ['Reading', 'Sports', 'Programming']
    all_pairs = pd.MultiIndex.from_product([platforms, experiment_names], names=['platform', 'experiment_name'])
    df = experiments.groupby(['platform', 'experiment_name']).size().reindex(all_pairs, fill_value=0).reset_index()
    df.columns = ['platform', 'experiment_name', 'num_experiments']
    return df

Complexity

  • ⏰ Time complexity: O(n), where n is the number of rows in Experiments, since we scan the table once and join with a constant set.
  • 🧺 Space complexity: O(1), since the set of platforms and experiment names is constant.