Problem

Table: Departments

1
2
3
4
5
6
7
8
+-------------+---------+
| Column name | Type    |
+-------------+---------+
| dept_id     | int     |
| dept_name   | varchar |
+-------------+---------+

dept_id is the primary key.

Table: Employees

1
2
3
4
5
6
7
8
9
+-------------+---------+
| Column name | Type    |
+-------------+---------+
| emp_id      | int     |
| name        | varchar |
| dept_id     | int     |
+-------------+---------+

emp_id is the primary key. `dept_id` is a foreign key referencing `Departments.dept_id` (nullable if an employee may be unassigned).

Problem: Write an SQL query to return the department id, department name and the number of employees in each department. Departments with zero employees should still appear with count 0.

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
Departments:
dept_id | dept_name
1       | Sales
2       | Engineering
3       | HR

Employees:
emp_id | name   | dept_id
101    | Alice  | 1
102    | Bob    | 2
103    | Carol  | 2

Output:
dept_id | dept_name   | num_employees
1       | Sales       | 1
2       | Engineering | 2
3       | HR          | 0

Solution

Method 1 - Aggregation with LEFT JOIN

Intuition

We want every department listed even if no employees belong to it. A LEFT JOIN from Departments to Employees returns NULLs for missing employees; counting a nullable employee id (COUNT(e.emp_id)) yields zero for those departments.

Approach

  1. Start with the Departments table as the left table so every department appears.
  2. Left-join Employees on dept_id.
  3. Group by Departments.dept_id and dept_name and compute COUNT(e.emp_id) to get the employee count (ignoring NULLs).

Code

1
2
3
4
5
6
7
8
SELECT d.dept_id,
	   d.dept_name,
	   COUNT(e.emp_id) AS num_employees
FROM Departments AS d
LEFT JOIN Employees AS e
	ON e.dept_id = d.dept_id
GROUP BY d.dept_id, d.dept_name
ORDER BY d.dept_id;
1
2
3
4
5
6
7
8
SELECT d.dept_id,
	   d.dept_name,
	   COUNT(e.emp_id) AS num_employees
FROM departments d
LEFT JOIN employees e
  ON e.dept_id = d.dept_id
GROUP BY d.dept_id, d.dept_name
ORDER BY d.dept_id;
1
2
3
4
5
6
7
8
SELECT d.dept_id,
	   d.dept_name,
	   COUNT(e.emp_id) AS num_employees
FROM Departments d
LEFT JOIN Employees e
	ON e.dept_id = d.dept_id
GROUP BY d.dept_id, d.dept_name
ORDER BY d.dept_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import pandas as pd

# departments: DataFrame with columns ['dept_id', 'dept_name']
# employees: DataFrame with columns ['emp_id', 'name', 'dept_id']

def count_employees_by_department(departments: pd.DataFrame, employees: pd.DataFrame) -> pd.DataFrame:
	merged = departments.merge(employees[['emp_id','dept_id']], how='left', left_on='dept_id', right_on='dept_id')
	result = (merged.groupby(['dept_id','dept_name'], as_index=False)
					.agg(num_employees=('emp_id', 'count'))
					.sort_values('dept_id'))
	return result

Performance Analysis

  • 📈 Theoretical Complexity: O(N + M) – we scan the Employees table (size N) and Departments table (size M) once for the join/aggregation; grouping work across N rows dominates.
  • ⚡️ Execution & Trade-offs: The query is efficient when indexed: an index on Employees.dept_id will speed access during the join/aggregation and Departments.dept_id should be the primary key; databases typically perform the aggregation with a hash or sort-based group and the LEFT JOIN + GROUP BY pattern is idiomatic and readable. For very large Employees, ensure sufficient memory for aggregation or use partial/parallel aggregation supported by the engine.

Notes

  • If you want departments ordered by num_employees (descending), add ORDER BY num_employees DESC.
  • If employees may have invalid/missing dept_id, filter or handle them explicitly depending on business rules.