Problem

Table: Enrollments

1
2
3
4
5
6
7
8
9
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| course_id     | int     |
| grade         | int     |
+---------------+---------+
(student_id, course_id) is the primary key (combination of columns with unique values) of this table.
grade is never NULL.

Write a solution to find the highest grade with its corresponding course for each student. In case of a tie, you should find the course with the smallest course_id.

Return the result table ordered by student_id in ascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
Input: 
Enrollments table:
+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 2          | 2         | 95    |
| 2          | 3         | 95    |
| 1          | 1         | 90    |
| 1          | 2         | 99    |
| 3          | 1         | 80    |
| 3          | 2         | 75    |
| 3          | 3         | 82    |
+------------+-----------+-------+
Output: 
+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 1          | 2         | 99    |
| 2          | 2         | 95    |
| 3          | 3         | 82    |
+------------+-----------+-------+

Solution

Method 1 – Window Functions (SQL Standard)

Intuition

We want, for each student, the course with the highest grade. If there are ties, we want the course with the smallest course_id. Window functions let us rank each row within each student by grade and course_id.

Approach

  1. Use RANK() or ROW_NUMBER() window function partitioned by student_id, ordered by grade (descending) and course_id (ascending).
  2. Select only the rows where the rank is 1 (i.e., the highest grade and, in case of tie, the smallest course_id).
  3. Return the student_id, course_id, and grade.

Code

1
2
3
4
5
6
7
SELECT student_id, course_id, grade
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY grade DESC, course_id ASC) as rn
  FROM Enrollments
) t
WHERE rn = 1;
1
2
3
4
5
6
7
SELECT student_id, course_id, grade
FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY grade DESC, course_id ASC) as rn
  FROM Enrollments
) t
WHERE rn = 1;
1
2
3
4
def highest_grade_for_each_student(df):
    df = df.sort_values(['student_id', 'grade', 'course_id'], ascending=[True, False, True])
    ans = df.groupby('student_id', as_index=False).first()
    return ans[['student_id', 'course_id', 'grade']]

Complexity

  • ⏰ Time complexity: O(N log N), due to sorting by grade and course_id for each student.
  • 🧺 Space complexity: O(N), for storing the intermediate results and output.