Design a Database for Storing Students’ Grades

Problem

Design a simple relational database to store students, courses, and course enrollments (grades). In addition to the schema, provide a SQL query that returns the honor roll students — the top 10% of students by grade point average (GPA) — along with their GPA.

Solution

1. Requirements Analysis

Functional Requirements:

  • Store student information (name, birthdate, contact info).
  • Store course information (course name, description and optionally instructor).
  • Record enrollments: which student took which course and the grade received.
  • Compute per-student GPA across courses and query top performers (honor roll).

Data Requirements:

  • Students: unique identifier, first/last name, birthdate, optional contact fields.
  • Courses: unique identifier, title, description, optional instructor/professor.
  • Enrollments: links between students and courses with a grade and timestamp.

Assumptions & constraints:

  • Grades are stored as numeric (e.g., integer 0–100 or decimal grade points). For GPA calculations, we assume numeric grades and use NUMERIC(4,2) for averages.
  • A student may enroll in the same course multiple times (e.g., repeats); each attempt is a separate enrollment record.

2. Conceptual Design (Entities & Relationships)

Entities:

  • Student — stores personal details about a student.
  • Course — stores course metadata.
  • Enrollment — associative entity linking Student and Course and storing the grade.

Relationships:

  • A Student can have many Enrollments (one-to-many).
  • A Course can have many Enrollments (one-to-many).

Optional entity:

  • Professor — if you need to track instructor details and associate them with courses.

3. Logical Design (ER Diagram)

erDiagram
	STUDENT {
		int student_id PK
		varchar first_name
		varchar last_name
		date birthdate
		varchar email
	}
	COURSE {
		int course_id PK
		varchar course_code
		varchar title
		text description
	}
	ENROLLMENT {
		int enrollment_id PK
		int student_id FK
		int course_id FK
		numeric grade
		timestamp enrolled_at
	}

	STUDENT ||--o{ ENROLLMENT : "has"
	COURSE ||--o{ ENROLLMENT : "is_taken_in"
  

4. Physical Design (SQL Schema)

The following PostgreSQL DDL implements the above logical model. It uses SERIAL/BIGSERIAL for primary keys and appropriate constraints.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Students table
CREATE TABLE students (
	student_id SERIAL PRIMARY KEY,
	first_name VARCHAR(60) NOT NULL,
	last_name VARCHAR(60) NOT NULL,
	birthdate DATE,
	email VARCHAR(255) UNIQUE
);

-- Courses table
CREATE TABLE courses (
	course_id SERIAL PRIMARY KEY,
	course_code VARCHAR(20),
	title VARCHAR(255) NOT NULL,
	description TEXT
);

-- Enrollments table: links students and courses and stores grade
CREATE TABLE enrollments (
	enrollment_id SERIAL PRIMARY KEY,
	student_id INTEGER NOT NULL REFERENCES students(student_id) ON DELETE CASCADE,
	course_id INTEGER NOT NULL REFERENCES courses(course_id) ON DELETE CASCADE,
	grade NUMERIC(5,2) NOT NULL,
	enrolled_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
	CONSTRAINT uq_student_course_attempt UNIQUE(student_id, course_id, enrolled_at)
);

-- Useful indexes
CREATE INDEX idx_enrollments_student ON enrollments(student_id);
CREATE INDEX idx_enrollments_course ON enrollments(course_id);
CREATE INDEX idx_enrollments_grade ON enrollments(grade);

Honor-roll query (top 10% by GPA). This example computes per-student GPA and selects the top 10% using row numbers relative to the total student count:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
WITH student_gpa AS (
	SELECT
		student_id,
		AVG(grade)::NUMERIC(5,2) AS gpa
	FROM enrollments
	GROUP BY student_id
), totals AS (
	SELECT COUNT(*) AS total_students FROM student_gpa
), ranked AS (
	SELECT
		sg.student_id,
		sg.gpa,
		ROW_NUMBER() OVER (ORDER BY sg.gpa DESC) AS rn,
		t.total_students
	FROM student_gpa sg CROSS JOIN totals t
)
SELECT s.student_id, s.first_name, s.last_name, r.gpa
FROM ranked r
JOIN students s ON s.student_id = r.student_id
WHERE r.rn <= CEIL(r.total_students * 0.10)
ORDER BY r.gpa DESC;

Notes:

  • The query ranks students by GPA and selects the top 10% (CEIL used to include at least one student when totals are small).
  • If you prefer a percentile threshold (e.g., GPA >= 90th percentile), use percentile_cont(0.9) WITHIN GROUP (ORDER BY gpa) on student_gpa.

5. Design Rationale & Indexing

Normalization and integrity:

  • The schema is in 3rd Normal Form (3NF). Students, Courses, and Enrollments are separate entities with no redundant storage of non-key data.
  • Referential integrity is enforced with foreign keys; ON DELETE CASCADE is used on enrollments to remove dependent records when a student or course is deleted.

Data types and precision:

  • NUMERIC(5,2) allows grades and GPAs with two decimal places (range up to 999.99); adjust precision based on grading scale (e.g., NUMERIC(4,2) for 0–100 scale).

Indexing strategy:

  • Create indexes on enrollments(student_id) and enrollments(course_id) to optimize joins and group-by queries (GPA aggregation and per-course reporting).
  • Consider a composite index (student_id, grade) if queries frequently filter or sort by both.
  • If the honor-roll query becomes frequent, maintain a materialized view student_gpa refreshed on a schedule for fast reads.

Performance and scaling:

  • For large datasets, partition enrollments by year or course to improve query locality.
  • Use connection pooling and efficient batch loads when ingesting grades.

Additional operational considerations:

  • Enforce application-level checks for business rules such as preventing overlapping enrollments where inappropriate.
  • Store audit fields (created_at, updated_at, created_by) if traceability is required.

Source