Problem
Table: Courses
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| student | varchar |
| class | varchar |
+-------------+---------+
(student, class) is the primary key column for this table.
Each row of this table indicates the name of a student and the class in which they are enrolled.
Write an SQL query to report all the classes that have at least five students.
Return the result table in any order.
Examples
Example 1:
Input: Courses table:
+---------+----------+
| student | class |
+---------+----------+
| A | Math |
| B | English |
| C | Math |
| D | Biology |
| E | Math |
| F | Computer |
| G | Math |
| H | Math |
| I | Math |
+---------+----------+
Output:
+---------+
| class |
+---------+
| Math |
+---------+
Explanation:
- Math has 6 students, so we include it.
- English has 1 student, so we do not include it.
- Biology has 1 student, so we do not include it.
- Computer has 1 student, so we do not include it.
Solution
Method 1 - Group by and Count
Code
Sql
SELECT class
FROM Courses
GROUP BY class
HAVING COUNT(*) >= 5;
Pandas
- Group the DataFrame by class using the groupby function.
- Count the number of students in each class by using the count function on the grouped DataFrame and resetting the index.
- Filter the class_counts DataFrame to include only the rows where the student count is greater than or equal to five.
- Return the resulting DataFrame containing the classes with at least five students.
import pandas as pd
def find_classes(courses: pd.DataFrame) -> pd.DataFrame:
stats = courses.groupby(['class']).count().reset_index()
# filter for atleast 5 students
return stats[stats['student'] >= 5][['class']]