Design a Database for Storing Students' Grades
Design a Database for Storing Students' Grades
Problem
Design a simple relational database to store students, courses, and course enrollments (grades). In addition to the schema, provide a SQL query that returns the honor roll students — the top 10% of students by grade point average (GPA) — along with their GPA.
Solution
1. Requirements Analysis
Functional Requirements:
- Store student information (name, birthdate, contact info).
- Store course information (course name, description and optionally instructor).
- Record enrollments: which student took which course and the grade received.
- Compute per-student GPA across courses and query top performers (honor roll).
Data Requirements:
- Students: unique identifier, first/last name, birthdate, optional contact fields.
- Courses: unique identifier, title, description, optional instructor/professor.
- Enrollments: links between students and courses with a grade and timestamp.
Assumptions & constraints:
- Grades are stored as numeric (e.g., integer 0–100 or decimal grade points). For GPA calculations, we assume numeric grades and use
NUMERIC(4,2)for averages. - A student may enroll in the same course multiple times (e.g., repeats); each attempt is a separate enrollment record.
2. Conceptual Design (Entities & Relationships)
Entities:
- Student — stores personal details about a student.
- Course — stores course metadata.
- Enrollment — associative entity linking Student and Course and storing the
grade.
Relationships:
- A Student can have many Enrollments (one-to-many).
- A Course can have many Enrollments (one-to-many).
Optional entity:
- Professor — if you need to track instructor details and associate them with courses.
3. Logical Design (ER Diagram)
erDiagram STUDENT { int student_id PK varchar first_name varchar last_name date birthdate varchar email } COURSE { int course_id PK varchar course_code varchar title text description } ENROLLMENT { int enrollment_id PK int student_id FK int course_id FK numeric grade timestamp enrolled_at } STUDENT ||--o{ ENROLLMENT : "has" COURSE ||--o{ ENROLLMENT : "is_taken_in"
4. Physical Design (SQL Schema)
The following PostgreSQL DDL implements the above logical model. It uses SERIAL/BIGSERIAL for primary keys and appropriate constraints.
-- Students table
CREATE TABLE students (
student_id SERIAL PRIMARY KEY,
first_name VARCHAR(60) NOT NULL,
last_name VARCHAR(60) NOT NULL,
birthdate DATE,
email VARCHAR(255) UNIQUE
);
-- Courses table
CREATE TABLE courses (
course_id SERIAL PRIMARY KEY,
course_code VARCHAR(20),
title VARCHAR(255) NOT NULL,
description TEXT
);
-- Enrollments table: links students and courses and stores grade
CREATE TABLE enrollments (
enrollment_id SERIAL PRIMARY KEY,
student_id INTEGER NOT NULL REFERENCES students(student_id) ON DELETE CASCADE,
course_id INTEGER NOT NULL REFERENCES courses(course_id) ON DELETE CASCADE,
grade NUMERIC(5,2) NOT NULL,
enrolled_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
CONSTRAINT uq_student_course_attempt UNIQUE(student_id, course_id, enrolled_at)
);
-- Useful indexes
CREATE INDEX idx_enrollments_student ON enrollments(student_id);
CREATE INDEX idx_enrollments_course ON enrollments(course_id);
CREATE INDEX idx_enrollments_grade ON enrollments(grade);
Honor-roll query (top 10% by GPA). This example computes per-student GPA and selects the top 10% using row numbers relative to the total student count:
WITH student_gpa AS (
SELECT
student_id,
AVG(grade)::NUMERIC(5,2) AS gpa
FROM enrollments
GROUP BY student_id
), totals AS (
SELECT COUNT(*) AS total_students FROM student_gpa
), ranked AS (
SELECT
sg.student_id,
sg.gpa,
ROW_NUMBER() OVER (ORDER BY sg.gpa DESC) AS rn,
t.total_students
FROM student_gpa sg CROSS JOIN totals t
)
SELECT s.student_id, s.first_name, s.last_name, r.gpa
FROM ranked r
JOIN students s ON s.student_id = r.student_id
WHERE r.rn <= CEIL(r.total_students * 0.10)
ORDER BY r.gpa DESC;
Notes:
- The query ranks students by GPA and selects the top 10% (CEIL used to include at least one student when totals are small).
- If you prefer a percentile threshold (e.g., GPA >= 90th percentile), use
percentile_cont(0.9) WITHIN GROUP (ORDER BY gpa)onstudent_gpa.
5. Design Rationale & Indexing
Normalization and integrity:
- The schema is in 3rd Normal Form (3NF). Students, Courses, and Enrollments are separate entities with no redundant storage of non-key data.
- Referential integrity is enforced with foreign keys;
ON DELETE CASCADEis used on enrollments to remove dependent records when a student or course is deleted.
Data types and precision:
NUMERIC(5,2)allows grades and GPAs with two decimal places (range up to 999.99); adjust precision based on grading scale (e.g.,NUMERIC(4,2)for 0–100 scale).
Indexing strategy:
- Create indexes on
enrollments(student_id)andenrollments(course_id)to optimize joins and group-by queries (GPA aggregation and per-course reporting). - Consider a composite index
(student_id, grade)if queries frequently filter or sort by both. - If the honor-roll query becomes frequent, maintain a materialized view
student_gparefreshed on a schedule for fast reads.
Performance and scaling:
- For large datasets, partition
enrollmentsby year or course to improve query locality. - Use connection pooling and efficient batch loads when ingesting grades.
Additional operational considerations:
- Enforce application-level checks for business rules such as preventing overlapping enrollments where inappropriate.
- Store audit fields (created_at, updated_at, created_by) if traceability is required.