+---------------+---------+
|Column Name |Type|+---------------+---------+
| movie_id | int || title | varchar |+---------------+---------+
movie_id is the primarykey (columnwithuniquevalues) for this table.
title is the name of the movie.
Table: `Users`+---------------+---------+
|Column Name |Type|+---------------+---------+
| user_id | int || name | varchar |+---------------+---------+
user_id is the primarykey (columnwithuniquevalues) for this table.
The column'name' has uniquevalues.
Table: `MovieRating`+---------------+---------+
|Column Name |Type|+---------------+---------+
| movie_id | int || user_id | int || rating | int || created_at | date |+---------------+---------+
(movie_id, user_id) is the primarykey (columnwithuniquevalues) for this table.
This tablecontains the rating of a movie by a userin their review.
created_at is the user's review date.
Write a solution to:
Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.
Input:
Movies table:+-------------+--------------+| movie_id | title |+-------------+--------------+|1| Avengers ||2| Frozen 2||3| Joker |+-------------+--------------+Users table:+-------------+--------------+| user_id | name |+-------------+--------------+|1| Daniel ||2| Monica ||3| Maria ||4| James |+-------------+--------------+MovieRating table:+-------------+--------------+--------------+-------------+| movie_id | user_id | rating | created_at |+-------------+--------------+--------------+-------------+|1|1|3|2020-01-12||1|2|4|2020-02-11||1|3|2|2020-02-12||1|4|1|2020-01-01||2|1|5|2020-02-17||2|2|2|2020-02-01||2|3|2|2020-03-01||3|1|3|2020-02-22||3|2|4|2020-02-25|+-------------+--------------+--------------+-------------+Output:
+--------------+| results |+--------------+| Daniel || Frozen 2|+--------------+Explanation:
Daniel and Monica have rated 3movies("Avengers","Frozen 2" and "Joker") but Daniel is smaller lexicographically.Frozen 2 and Joker have a rating average of 3.5in February but Frozen 2is smaller lexicographically.
We need two queries: one to find the user who rated the most movies (with lexicographical tie-break), and one to find the movie with the highest average rating in February 2020 (with lexicographical tie-break).
For the user: Group MovieRating by user_id, count ratings, join with Users, order by count desc and name asc, limit 1.
For the movie: Filter MovieRating for February 2020, group by movie_id, calculate average, join with Movies, order by average desc and title asc, limit 1.
-- User with most ratings
SELECT name AS results
FROM Users u
JOIN (
SELECT user_id, COUNT(*) AS cnt
FROM MovieRating
GROUPBY user_id
) r ON u.user_id = r.user_id
ORDERBY cnt DESC, name ASCLIMIT1UNIONALL-- Movie with highest average rating in Feb 2020
SELECT title AS results
FROM Movies m
JOIN (
SELECT movie_id, AVG(rating) AS avg_rating
FROM MovieRating
WHERE created_at BETWEEN'2020-02-01'AND'2020-02-29'GROUPBY movie_id
) r ON m.movie_id = r.movie_id
ORDERBY avg_rating DESC, title ASCLIMIT1;
1
2
3
4
5
6
7
8
9
10
11
# User with most ratingsuser_counts = movierating.groupby('user_id').size()
max_count = user_counts.max()
user_id = user_counts[user_counts == max_count].index.min()
user_name = users.loc[users['user_id'] == user_id, 'name'].values[0]
# Movie with highest average rating in Feb 2020feb = movierating[(movierating['created_at'] >='2020-02-01') & (movierating['created_at'] <='2020-02-29')]
movie_avg = feb.groupby('movie_id')['rating'].mean()
max_avg = movie_avg.max()
movie_id = movie_avg[movie_avg == max_avg].index.min()
movie_title = movies.loc[movies['movie_id'] == movie_id, 'title'].values[0]