+---------------+---------+
| Column Name | Type |
+---------------+---------+
| movie_id | int |
| title | varchar |
+---------------+---------+
movie_id is the primary key (column with unique values) for this table.
title is the name of the movie.
Table: Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
+---------------+---------+
user_id is the primary key (column with unique values) for this table.
The column 'name' has unique values.
Table: MovieRating
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| movie_id | int |
| user_id | int |
| rating | int |
| created_at | date |
+---------------+---------+
(movie_id, user_id) is the primary key (column with unique values) for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user's review date.
Write a solution to:
Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.
Input:
Movies table:+-------------+--------------+| movie_id | title |+-------------+--------------+|1| Avengers ||2| Frozen 2||3| Joker |+-------------+--------------+ Users table:+-------------+--------------+| user_id | name |+-------------+--------------+|1| Daniel ||2| Monica ||3| Maria ||4| James |+-------------+--------------+ MovieRating table:+-------------+--------------+--------------+-------------+| movie_id | user_id | rating | created_at |+-------------+--------------+--------------+-------------+|1|1|3|2020-01-12||1|2|4|2020-02-11||1|3|2|2020-02-12||1|4|1|2020-01-01||2|1|5|2020-02-17||2|2|2|2020-02-01||2|3|2|2020-03-01||3|1|3|2020-02-22||3|2|4|2020-02-25|+-------------+--------------+--------------+-------------+ Output:+--------------+| results |+--------------+| Daniel || Frozen 2|+--------------+ Explanation: Daniel and Monica have rated 3movies("Avengers","Frozen 2" and "Joker") but Daniel is smaller lexicographically. Frozen 2 and Joker have a rating average of 3.5in February but Frozen 2is smaller lexicographically.## Solution
### Method 1-#### Intuition
We need two queries: one to find the user who rated the most movies(with lexicographical tie-break), and one to find the movie with the highest average rating in February 2020(with lexicographical tie-break).#### Approach
1. For the user: Group MovieRating by user_id, count ratings, join with Users, order by count desc and name asc, limit 1.2. For the movie: Filter MovieRating for February 2020, group by movie_id, calculate average, join with Movies, order by average desc and title asc, limit 1.#### Code
-- User with most ratings
SELECT name AS results
FROM Users u
JOIN (
SELECT user_id, COUNT(*) AS cnt
FROM MovieRating
GROUPBY user_id
) r ON u.user_id = r.user_id
ORDERBY cnt DESC, name ASCLIMIT1UNIONALL-- Movie with highest average rating in Feb 2020
SELECT title AS results
FROM Movies m
JOIN (
SELECT movie_id, AVG(rating) AS avg_rating
FROM MovieRating
WHERE created_at BETWEEN'2020-02-01'AND'2020-02-29'GROUPBY movie_id
) r ON m.movie_id = r.movie_id
ORDERBY avg_rating DESC, title ASCLIMIT1;
1
2
3
4
5
6
7
8
9
10
11
# User with most ratingsuser_counts = movierating.groupby('user_id').size()
max_count = user_counts.max()
user_id = user_counts[user_counts == max_count].index.min()
user_name = users.loc[users['user_id'] == user_id, 'name'].values[0]
# Movie with highest average rating in Feb 2020feb = movierating[(movierating['created_at'] >='2020-02-01') & (movierating['created_at'] <='2020-02-29')]
movie_avg = feb.groupby('movie_id')['rating'].mean()
max_avg = movie_avg.max()
movie_id = movie_avg[movie_avg == max_avg].index.min()
movie_title = movies.loc[movies['movie_id'] == movie_id, 'title'].values[0]
1
2
String sql ="..."; // Use the SQL above// Execute and fetch result
#### Complexity
*⏰ Time complexity:`O(N)`for each query(group by, join, sort).*🧺 Space complexity:`O(U + M)`for intermediate results.