Problem

Table: Courses

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| student     | varchar |
| class       | varchar |
+-------------+---------+

(student, class) is the primary key column for this table.
Each row of this table indicates the name of a student and the class in which they are enrolled.

Write an SQL query to report all the classes that have at least five students.

Return the result table in any order.

Examples

Example 1:

Input: Courses table:

+---------+----------+
| student | class    |
+---------+----------+
| A       | Math     |
| B       | English  |
| C       | Math     |
| D       | Biology  |
| E       | Math     |
| F       | Computer |
| G       | Math     |
| H       | Math     |
| I       | Math     |
+---------+----------+

Output:

+---------+
| class   |
+---------+
| Math    |
+---------+

Explanation:

  • Math has 6 students, so we include it.
  • English has 1 student, so we do not include it.
  • Biology has 1 student, so we do not include it.
  • Computer has 1 student, so we do not include it.

Solution

Method 1 - Group by and Count

Code

Sql
SELECT class
FROM Courses 
GROUP BY class
HAVING COUNT(*) >= 5;
Pandas
  • Group the DataFrame by class using the groupby function.
  • Count the number of students in each class by using the count function on the grouped DataFrame and resetting the index.
  • Filter the class_counts DataFrame to include only the rows where the student count is greater than or equal to five.
  • Return the resulting DataFrame containing the classes with at least five students.
import pandas as pd

def find_classes(courses: pd.DataFrame) -> pd.DataFrame:
    stats = courses.groupby(['class']).count().reset_index()
    # filter for atleast 5 students
    return stats[stats['student'] >= 5][['class']]