Problem

Table: ActorDirector

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| actor_id    | int     |
| director_id | int     |
| timestamp   | int     |
+-------------+---------+
timestamp is the primary key column for this table.

Write a SQL query for a report that provides the pairs (actor_id, director_id) where the actor has cooperated with the director at least three times.

Return the result table in any order.

Examples

Example 1:

Input:

ActorDirector table:

+-------------+-------------+-------------+
| actor_id    | director_id | timestamp   |
+-------------+-------------+-------------+
| 1           | 1           | 0           |
| 1           | 1           | 1           |
| 1           | 1           | 2           |
| 1           | 2           | 3           |
| 1           | 2           | 4           |
| 2           | 1           | 5           |
| 2           | 1           | 6           |
+-------------+-------------+-------------+

Output:

+-------------+-------------+
| actor_id    | director_id |
+-------------+-------------+
| 1           | 1           |
+-------------+-------------+

Explanation: The only pair is (1, 1) where they cooperated exactly 3 times.

Solution

Method 1 - Using Group by

Code

SQL

Using count(1):

SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING COUNT(1) >= 3;

Using count(timestamp):

SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING COUNT(timestamp) >= 3;

Using sum:

SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING SUM(1) >= 3;
Pandas
import pandas as pd

def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:
	stats = actor_director.groupby(['actor_id', 'director_id']).size().reset_index(name='cooperation_count')
	filtered_pairs = stats[stats['cooperation_count'] >= 3]
    return filtered_pairs[['actor_id', 'director_id']]

Method 2 - Using Window Function and CTE 🏆

Code

Sql
WITH WorkRank AS (
SELECT actor_id,director_id
,ROW_NUMBER()OVER(PARTITION BY actor_id,director_id ORDER BY timestamp) AS RN
FROM ActorDirector
    )
    SELECT DISTINCT actor_id,director_id
    FROM WorkRank WHERE RN >=3