Problem
Table: ActorDirector
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| actor_id | int |
| director_id | int |
| timestamp | int |
+-------------+---------+
timestamp is the primary key column for this table.
Write a SQL query for a report that provides the pairs (actor_id, director_id)
where the actor has cooperated with the director at least three times.
Return the result table in any order.
Examples
Example 1:
Input:
ActorDirector
table:
+-------------+-------------+-------------+
| actor_id | director_id | timestamp |
+-------------+-------------+-------------+
| 1 | 1 | 0 |
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 1 | 2 | 3 |
| 1 | 2 | 4 |
| 2 | 1 | 5 |
| 2 | 1 | 6 |
+-------------+-------------+-------------+
Output:
+-------------+-------------+
| actor_id | director_id |
+-------------+-------------+
| 1 | 1 |
+-------------+-------------+
Explanation: The only pair is (1, 1) where they cooperated exactly 3 times.
Solution
Method 1 - Using Group by
Code
SQL
Using count(1):
SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING COUNT(1) >= 3;
Using count(timestamp):
SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING COUNT(timestamp) >= 3;
Using sum:
SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING SUM(1) >= 3;
Pandas
import pandas as pd
def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:
stats = actor_director.groupby(['actor_id', 'director_id']).size().reset_index(name='cooperation_count')
filtered_pairs = stats[stats['cooperation_count'] >= 3]
return filtered_pairs[['actor_id', 'director_id']]
Method 2 - Using Window Function and CTE 🏆
Code
Sql
WITH WorkRank AS (
SELECT actor_id,director_id
,ROW_NUMBER()OVER(PARTITION BY actor_id,director_id ORDER BY timestamp) AS RN
FROM ActorDirector
)
SELECT DISTINCT actor_id,director_id
FROM WorkRank WHERE RN >=3