Problem

Table: Movies

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| title         | varchar |
+---------------+---------+
movie_id is the primary key (column with unique values) for this table.
title is the name of the movie.

Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
+---------------+---------+
user_id is the primary key (column with unique values) for this table.
The column 'name' has unique values.

Table: MovieRating

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| user_id       | int     |
| rating        | int     |
| created_at    | date    |
+---------------+---------+
(movie_id, user_id) is the primary key (column with unique values) for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user's review date.

Write a solution to:

  • Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
  • Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.

The result format is in the following example.

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

    
    
    Input: 
    Movies table:
    +-------------+--------------+
    | movie_id    |  title       |
    +-------------+--------------+
    | 1           | Avengers     |
    | 2           | Frozen 2     |
    | 3           | Joker        |
    +-------------+--------------+
    Users table:
    +-------------+--------------+
    | user_id     |  name        |
    +-------------+--------------+
    | 1           | Daniel       |
    | 2           | Monica       |
    | 3           | Maria        |
    | 4           | James        |
    +-------------+--------------+
    MovieRating table:
    +-------------+--------------+--------------+-------------+
    | movie_id    | user_id      | rating       | created_at  |
    +-------------+--------------+--------------+-------------+
    | 1           | 1            | 3            | 2020-01-12  |
    | 1           | 2            | 4            | 2020-02-11  |
    | 1           | 3            | 2            | 2020-02-12  |
    | 1           | 4            | 1            | 2020-01-01  |
    | 2           | 1            | 5            | 2020-02-17  | 
    | 2           | 2            | 2            | 2020-02-01  | 
    | 2           | 3            | 2            | 2020-03-01  |
    | 3           | 1            | 3            | 2020-02-22  | 
    | 3           | 2            | 4            | 2020-02-25  | 
    +-------------+--------------+--------------+-------------+
    Output: 
    +--------------+
    | results      |
    +--------------+
    | Daniel       |
    | Frozen 2     |
    +--------------+
    Explanation: 
    Daniel and Monica have rated 3 movies ("Avengers", "Frozen 2" and "Joker") but Daniel is smaller lexicographically.
    Frozen 2 and Joker have a rating average of 3.5 in February but Frozen 2 is smaller lexicographically.
    

## Solution

### Method 1 -

#### Intuition

We need two queries: one to find the user who rated the most movies (with lexicographical tie-break), and one to find the movie with the highest average rating in February 2020 (with lexicographical tie-break).

#### Approach

1. For the user: Group MovieRating by user_id, count ratings, join with Users, order by count desc and name asc, limit 1.
2. For the movie: Filter MovieRating for February 2020, group by movie_id, calculate average, join with Movies, order by average desc and title asc, limit 1.

#### Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- User with most ratings
SELECT name AS results
FROM Users u
JOIN (
  SELECT user_id, COUNT(*) AS cnt
  FROM MovieRating
  GROUP BY user_id
) r ON u.user_id = r.user_id
ORDER BY cnt DESC, name ASC
LIMIT 1
UNION ALL
-- Movie with highest average rating in Feb 2020
SELECT title AS results
FROM Movies m
JOIN (
  SELECT movie_id, AVG(rating) AS avg_rating
  FROM MovieRating
  WHERE created_at BETWEEN '2020-02-01' AND '2020-02-29'
  GROUP BY movie_id
) r ON m.movie_id = r.movie_id
ORDER BY avg_rating DESC, title ASC
LIMIT 1;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# User with most ratings
user_counts = movierating.groupby('user_id').size()
max_count = user_counts.max()
user_id = user_counts[user_counts == max_count].index.min()
user_name = users.loc[users['user_id'] == user_id, 'name'].values[0]
# Movie with highest average rating in Feb 2020
feb = movierating[(movierating['created_at'] >= '2020-02-01') & (movierating['created_at'] <= '2020-02-29')]
movie_avg = feb.groupby('movie_id')['rating'].mean()
max_avg = movie_avg.max()
movie_id = movie_avg[movie_avg == max_avg].index.min()
movie_title = movies.loc[movies['movie_id'] == movie_id, 'title'].values[0]
1
2
String sql = "..."; // Use the SQL above
// Execute and fetch result
#### Complexity * Time complexity: `O(N)` for each query (group by, join, sort). * 🧺 Space complexity: `O(U + M)` for intermediate results.