Problem

Table: Scores

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| student_id  | int     |
| subject     | varchar |
| score       | int     |
| exam_date   | varchar |
+-------------+---------+
(student_id, subject, exam_date) is the primary key for this table.
Each row contains information about a student's score in a specific subject on a particular exam date. score is between 0 and 100 (inclusive).

Write a solution to find the students who have shown improvement. A student is considered to have shown improvement if they meet both of these conditions:

  • Have taken exams in the same subject on at least two different dates
  • Their latest score in that subject is higher than their first score

Return the result table ordered by student_id, subject inascending order.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Input:
Scores table:
+------------+----------+-------+------------+
| student_id | subject  | score | exam_date  |
+------------+----------+-------+------------+
| 101        | Math     | 70    | 2023-01-15 |
| 101        | Math     | 85    | 2023-02-15 |
| 101        | Physics  | 65    | 2023-01-15 |
| 101        | Physics  | 60    | 2023-02-15 |
| 102        | Math     | 80    | 2023-01-15 |
| 102        | Math     | 85    | 2023-02-15 |
| 103        | Math     | 90    | 2023-01-15 |
| 104        | Physics  | 75    | 2023-01-15 |
| 104        | Physics  | 85    | 2023-02-15 |
+------------+----------+-------+------------+
Output:
+------------+----------+-------------+--------------+
| student_id | subject  | first_score | latest_score |
+------------+----------+-------------+--------------+
| 101        | Math     | 70          | 85           |
| 102        | Math     | 80          | 85           |
| 104        | Physics  | 75          | 85           |
+------------+----------+-------------+--------------+
Explanation:
* Student 101 in Math: Improved from 70 to 85
* Student 101 in Physics: No improvement (dropped from 65 to 60)
* Student 102 in Math: Improved from 80 to 85
* Student 103 in Math: Only one exam, not eligible
* Student 104 in Physics: Improved from 75 to 85
Result table is ordered by student_id, subject.

Example 2:

Solution

Method 1 – Window Functions and Grouping

Intuition

To find students who improved, we need to compare their first and latest scores in each subject. We can use window functions to get the first and last score per (student, subject), and then filter for those where the latest score is higher and there are at least two exams.

Approach

  1. Use window functions (ROW_NUMBER or MIN/MAX with PARTITION BY) to get the first and last exam date and score for each (student_id, subject).
  2. Count the number of exams per (student_id, subject).
  3. Filter for those with at least two exams and where the latest score is greater than the first score.
  4. Return the result ordered by student_id and subject ascending.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
WITH ranked AS (
  SELECT student_id, subject, score,
         exam_date,
         ROW_NUMBER() OVER (PARTITION BY student_id, subject ORDER BY exam_date) AS rn_first,
         ROW_NUMBER() OVER (PARTITION BY student_id, subject ORDER BY exam_date DESC) AS rn_last,
         COUNT(*) OVER (PARTITION BY student_id, subject) AS cnt
  FROM Scores
)
SELECT student_id, subject
FROM (
  SELECT student_id, subject,
         MAX(CASE WHEN rn_first = 1 THEN score END) AS first_score,
         MAX(CASE WHEN rn_last = 1 THEN score END) AS last_score,
         MAX(cnt) AS cnt
  FROM ranked
  GROUP BY student_id, subject
) t
WHERE cnt >= 2 AND last_score > first_score
ORDER BY student_id, subject;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
WITH ranked AS (
  SELECT student_id, subject, score,
         exam_date,
         ROW_NUMBER() OVER (PARTITION BY student_id, subject ORDER BY exam_date) AS rn_first,
         ROW_NUMBER() OVER (PARTITION BY student_id, subject ORDER BY exam_date DESC) AS rn_last,
         COUNT(*) OVER (PARTITION BY student_id, subject) AS cnt
  FROM Scores
)
SELECT student_id, subject
FROM (
  SELECT student_id, subject,
         MAX(CASE WHEN rn_first = 1 THEN score END) AS first_score,
         MAX(CASE WHEN rn_last = 1 THEN score END) AS last_score,
         MAX(cnt) AS cnt
  FROM ranked
  GROUP BY student_id, subject
) t
WHERE cnt >= 2 AND last_score > first_score
ORDER BY student_id, subject;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Solution:
    def find_students_who_improved(self, df):
        # df has columns: student_id, subject, score, exam_date (as string)
        import pandas as pd
        df = df.copy()
        df['exam_date'] = pd.to_datetime(df['exam_date'])
        grouped = df.sort_values('exam_date').groupby(['student_id', 'subject'])
        first = grouped.first().reset_index()
        last = grouped.last().reset_index()
        cnt = grouped.size().reset_index(name='cnt')
        merged = first.merge(last, on=['student_id', 'subject'], suffixes=('_first', '_last'))
        merged = merged.merge(cnt, on=['student_id', 'subject'])
        result = merged[(merged['cnt'] >= 2) & (merged['score_last'] > merged['score_first'])]
        return result[['student_id', 'subject']].sort_values(['student_id', 'subject'])

Complexity

  • ⏰ Time complexity: O(n log n), where n is the number of rows, due to sorting by exam_date.
  • 🧺 Space complexity: O(n), for storing intermediate results.