Leetcodify Similar Friends

Problem

Table: Listens

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| song_id     | int     |
| day         | date    |
+-------------+---------+
This table may contain duplicate rows.
Each row of this table indicates that the user user_id listened to the song song_id on the day day.

Table: Friendship

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user1_id      | int     |
| user2_id      | int     |
+---------------+---------+
(user1_id, user2_id) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates that the users user1_id and user2_id are friends.
Note that user1_id < user2_id.

Write a solution to report the similar friends of Leetcodify users. A user x and user y are similar friends if:

Users x and y are friends, and
Users x and y listened to the same three or more different songs on the same day.

Return the result table in any order. Note that you must return the similar pairs of friends the same way they were represented in the input (i.e., always user1_id < user2_id).

The result format is in the following example.

Examples

Example 1:

Input: 
Listens table:
+---------+---------+------------+
| user_id | song_id | day        |
+---------+---------+------------+
| 1       | 10      | 2021-03-15 |
| 1       | 11      | 2021-03-15 |
| 1       | 12      | 2021-03-15 |
| 2       | 10      | 2021-03-15 |
| 2       | 11      | 2021-03-15 |
| 2       | 12      | 2021-03-15 |
| 3       | 10      | 2021-03-15 |
| 3       | 11      | 2021-03-15 |
| 3       | 12      | 2021-03-15 |
| 4       | 10      | 2021-03-15 |
| 4       | 11      | 2021-03-15 |
| 4       | 13      | 2021-03-15 |
| 5       | 10      | 2021-03-16 |
| 5       | 11      | 2021-03-16 |
| 5       | 12      | 2021-03-16 |
+---------+---------+------------+
Friendship table:
+----------+----------+
| user1_id | user2_id |
+----------+----------+
| 1        | 2        |
| 2        | 4        |
| 2        | 5        |
+----------+----------+
Output: 
+----------+----------+
| user1_id | user2_id |
+----------+----------+
| 1        | 2        |
+----------+----------+
Explanation: 
Users 1 and 2 are friends, and they listened to songs 10, 11, and 12 on the same day. They are similar friends.
Users 1 and 3 listened to songs 10, 11, and 12 on the same day, but they are not friends.
Users 2 and 4 are friends, but they did not listen to the same three different songs.
Users 2 and 5 are friends and listened to songs 10, 11, and 12, but they did not listen to them on the same day.

Solution

Method 1 – Self Join, Grouping, and Join with Friendship

Intuition

We want to find pairs of friends who listened to at least three of the same songs on the same day. We can use a self-join on the Listens table to find such user pairs, group by user pairs and day, and count the number of common songs. Then, we filter to only those pairs that are friends.

Approach

Self-join the Listens table on day and song_id to find all user pairs who listened to the same song on the same day.
Only consider pairs where user_id < other_user_id to avoid duplicates.
Group by user pairs and day, and count the number of common songs.
Filter to pairs with at least 3 common songs.
Join with the Friendship table to keep only pairs who are friends.
Output (user1_id, user2_id) for each valid pair.

Code

MySQL

WITH common_songs AS (
  SELECT a.user_id AS user1, b.user_id AS user2, a.day
  FROM Listens a
  JOIN Listens b ON a.day = b.day AND a.song_id = b.song_id AND a.user_id < b.user_id
  GROUP BY a.user_id, b.user_id, a.day, a.song_id
),
user_pairs AS (
  SELECT user1, user2, day, COUNT(*) AS cnt
  FROM common_songs
  GROUP BY user1, user2, day
  HAVING cnt >= 3
)
SELECT f.user1_id, f.user2_id
FROM user_pairs u
JOIN Friendship f ON u.user1 = f.user1_id AND u.user2 = f.user2_id;

PostgreSQL

WITH common_songs AS (
  SELECT a.user_id AS user1, b.user_id AS user2, a.day
  FROM Listens a
  JOIN Listens b ON a.day = b.day AND a.song_id = b.song_id AND a.user_id < b.user_id
  GROUP BY a.user_id, b.user_id, a.day, a.song_id
),
user_pairs AS (
  SELECT user1, user2, day, COUNT(*) AS cnt
  FROM common_songs
  GROUP BY user1, user2, day
  HAVING COUNT(*) >= 3
)
SELECT f.user1_id, f.user2_id
FROM user_pairs u
JOIN Friendship f ON u.user1 = f.user1_id AND u.user2 = f.user2_id;

Python (Pandas)

def similar_friends(listens_df, friendship_df):
    import pandas as pd
    merged = listens_df.merge(listens_df, on=['day', 'song_id'])
    merged = merged[merged['user_id_x'] < merged['user_id_y']]
    grouped = merged.groupby(['user_id_x', 'user_id_y', 'day']).size().reset_index(name='cnt')
    filtered = grouped[grouped['cnt'] >= 3]
    friends = set(tuple(x) for x in friendship_df[['user1_id', 'user2_id']].values)
    recs = []
    for _, row in filtered.iterrows():
        u, v = row['user_id_x'], row['user_id_y']
        if (u, v) in friends:
            recs.append((u, v))
    recs_df = pd.DataFrame(recs, columns=['user1_id', 'user2_id']).drop_duplicates()
    return recs_df

Complexity

⏰ Time complexity: O(n^2) for the self-join, where n is the number of listens. Grouping and filtering are also O(n^2) in the worst case.
🧺 Space complexity: O(n^2) for storing all user pairs and intermediate results.