Problem

Table: Friendship

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user1_id      | int     |
| user2_id      | int     |
+---------------+---------+
(user1_id, user2_id) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates that the users user1_id and user2_id are friends.

Table: Likes

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| page_id     | int     |
+-------------+---------+
(user_id, page_id) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates that user_id likes page_id.

You are implementing a page recommendation system for a social media website. Your system will recommend a page to user_id if the page is liked by at least one friend of user_id and is not liked by user_id.

Write a solution to find all the possible page recommendations for every user. Each recommendation should appear as a row in the result table with these columns:

  • user_id: The ID of the user that your system is making the recommendation to.
  • page_id: The ID of the page that will be recommended to user_id.
  • friends_likes: The number of the friends of user_id that like page_id.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Input: 
Friendship table:
+----------+----------+
| user1_id | user2_id |
+----------+----------+
| 1        | 2        |
| 1        | 3        |
| 1        | 4        |
| 2        | 3        |
| 2        | 4        |
| 2        | 5        |
| 6        | 1        |
+----------+----------+
Likes table:
+---------+---------+
| user_id | page_id |
+---------+---------+
| 1       | 88      |
| 2       | 23      |
| 3       | 24      |
| 4       | 56      |
| 5       | 11      |
| 6       | 33      |
| 2       | 77      |
| 3       | 77      |
| 6       | 88      |
+---------+---------+
Output: 
+---------+---------+---------------+
| user_id | page_id | friends_likes |
+---------+---------+---------------+
| 1       | 77      | 2             |
| 1       | 23      | 1             |
| 1       | 24      | 1             |
| 1       | 56      | 1             |
| 1       | 33      | 1             |
| 2       | 24      | 1             |
| 2       | 56      | 1             |
| 2       | 11      | 1             |
| 2       | 88      | 1             |
| 3       | 88      | 1             |
| 3       | 23      | 1             |
| 4       | 88      | 1             |
| 4       | 77      | 1             |
| 4       | 23      | 1             |
| 5       | 77      | 1             |
| 5       | 23      | 1             |
+---------+---------+---------------+
Explanation: 
Take user 1 as an example:
- User 1 is friends with users 2, 3, 4, and 6.
- Recommended pages are 23 (user 2 liked it), 24 (user 3 liked it), 56 (user 3 liked it), 33 (user 6 liked it), and 77 (user 2 and user 3 liked it).
- Note that page 88 is not recommended because user 1 already liked it.
Another example is user 6:
- User 6 is friends with user 1.
- User 1 only liked page 88, but user 6 already liked it. Hence, user 6 has no recommendations.
You can recommend pages for users 2, 3, 4, and 5 using a similar process.

Solution

Method 1 – SQL Join and GroupBy

Intuition

The key idea is to find, for each user, the pages liked by their friends but not by themselves. We count how many friends liked each such page. This can be done by joining the friendship and likes tables, then filtering out pages already liked by the user.

Approach

  1. For each user, find all their friends (friendship is bidirectional).
  2. For each friend, get the pages they like.
  3. Exclude pages already liked by the user.
  4. Group by user and page, and count the number of friends who liked each page.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
SELECT
  u.user_id,
  l.page_id,
  COUNT(DISTINCT l.user_id) AS friends_likes
FROM
  (
    SELECT user1_id AS user_id, user2_id AS friend_id FROM Friendship
    UNION ALL
    SELECT user2_id AS user_id, user1_id AS friend_id FROM Friendship
  ) u
  JOIN Likes l ON u.friend_id = l.user_id
  LEFT JOIN Likes self_likes ON u.user_id = self_likes.user_id AND l.page_id = self_likes.page_id
WHERE
  self_likes.user_id IS NULL
GROUP BY
  u.user_id, l.page_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
  u.user_id,
  l.page_id,
  COUNT(DISTINCT l.user_id) AS friends_likes
FROM (
  SELECT user1_id AS user_id, user2_id AS friend_id FROM Friendship
  UNION ALL
  SELECT user2_id AS user_id, user1_id AS friend_id FROM Friendship
) u
JOIN Likes l ON u.friend_id = l.user_id
LEFT JOIN Likes self_likes ON u.user_id = self_likes.user_id AND l.page_id = self_likes.page_id
WHERE self_likes.user_id IS NULL
GROUP BY u.user_id, l.page_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class Solution:
    def page_recommendations(self, friendship: 'pd.DataFrame', likes: 'pd.DataFrame') -> 'pd.DataFrame':
        # Build bidirectional friendship
        f1 = friendship.rename(columns={'user1_id': 'user_id', 'user2_id': 'friend_id'})
        f2 = friendship.rename(columns={'user2_id': 'user_id', 'user1_id': 'friend_id'})
        friends = pd.concat([f1, f2], ignore_index=True)
        # Merge friends with likes
        merged = friends.merge(likes, left_on='friend_id', right_on='user_id')
        # Exclude pages already liked by the user
        self_likes = likes.rename(columns={'user_id': 'user_id', 'page_id': 'page_id'})
        merged = merged.merge(self_likes, how='left', left_on=['user_id', 'page_id'], right_on=['user_id', 'page_id'], indicator=True)
        merged = merged[merged['_merge'] == 'left_only']
        # Count friends_likes
        ans = merged.groupby(['user_id', 'page_id']).agg(friends_likes=('friend_id', 'nunique')).reset_index()
        return ans

Complexity

  • ⏰ Time complexity: O(F + L + N log N) where F is the number of friendships, L is the number of likes, and N is the number of users. Each join and groupby operation is linearithmic in the number of rows.
  • 🧺 Space complexity: O(F + L + R) where R is the number of recommendations output. We need space for the expanded friendship pairs, intermediate joins, and the result.