Problem

Table: Project

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| project_id  | int     |
| employee_id | int     |
| workload    | int     |
+-------------+---------+
employee_id is the primary key (column with unique values) of this table.
employee_id is a foreign key (reference column) to Employee table.
Each row of this table indicates that the employee with employee_id is working on the project with project_id and the workload of the project.

Table: Employees

+------------------+---------+
| Column Name      | Type    |
+------------------+---------+
| employee_id      | int     |
| name             | varchar |
| team             | varchar |
+------------------+---------+
employee_id is the primary key (column with unique values) of this table.
Each row of this table contains information about one employee.

Write a solution to find the employees who are allocated to projects with a workload that exceeds the average workload of all employees for their respective teams

Return t he result table ordered by employee_id, project_id inascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Input: 
Project table:
+-------------+-------------+----------+
| project_id  | employee_id | workload |
+-------------+-------------+----------+
| 1           | 1           |  45      |
| 1           | 2           |  90      | 
| 2           | 3           |  12      |
| 2           | 4           |  68      |
+-------------+-------------+----------+
Employees table:
+-------------+--------+------+
| employee_id | name   | team |
+-------------+--------+------+
| 1           | Khaled | A    |
| 2           | Ali    | B    |
| 3           | John   | B    |
| 4           | Doe    | A    |
+-------------+--------+------+
Output: 
+-------------+------------+---------------+------------------+
| employee_id | project_id | employee_name | project_workload |
+-------------+------------+---------------+------------------+  
| 2           | 1          | Ali           | 90               | 
| 4           | 2          | Doe           | 68               | 
+-------------+------------+---------------+------------------+
Explanation: 
- Employee with ID 1 has a project workload of 45 and belongs to Team A, where the average workload is 56.50. Since his project workload does not exceed the team's average workload, he will be excluded.
- Employee with ID 2 has a project workload of 90 and belongs to Team B, where the average workload is 51.00. Since his project workload does exceed the team's average workload, he will be included.
- Employee with ID 3 has a project workload of 12 and belongs to Team B, where the average workload is 51.00. Since his project workload does not exceed the team's average workload, he will be excluded.
- Employee with ID 4 has a project workload of 68 and belongs to Team A, where the average workload is 56.50. Since his project workload does exceed the team's average workload, he will be included.
Result table orderd by employee_id, project_id in ascending order.

Solution

Method 1 – Teamwise Average and Join (SQL, Pandas)

Intuition

For each employee, compare their project workload to the average workload of all employees in their team. If their workload is greater, include them in the result.

Approach

  1. Join Project and Employees tables on employee_id to get team info for each project allocation.
  2. For each team, compute the average workload of all employees in that team.
  3. For each project allocation, check if the workload exceeds the team average.
  4. Return the employee and project details where the condition is met, ordered by employee_id, project_id.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
WITH team_avg AS (
  SELECT e.team, AVG(p.workload) AS avg_workload
  FROM Employees e
  JOIN Project p ON e.employee_id = p.employee_id
  GROUP BY e.team
)
SELECT p.employee_id, p.project_id
FROM Project p
JOIN Employees e ON p.employee_id = e.employee_id
JOIN team_avg t ON e.team = t.team
WHERE p.workload > t.avg_workload
ORDER BY p.employee_id, p.project_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
WITH team_avg AS (
  SELECT e.team, AVG(p.workload) AS avg_workload
  FROM Employees e
  JOIN Project p ON e.employee_id = p.employee_id
  GROUP BY e.team
)
SELECT p.employee_id, p.project_id
FROM Project p
JOIN Employees e ON p.employee_id = e.employee_id
JOIN team_avg t ON e.team = t.team
WHERE p.workload > t.avg_workload
ORDER BY p.employee_id, p.project_id;
1
2
3
4
5
6
7
8
import pandas as pd

def employees_project_allocation(project: pd.DataFrame, employees: pd.DataFrame) -> pd.DataFrame:
    df = project.merge(employees, on='employee_id')
    team_avg = df.groupby('team')['workload'].mean().rename('avg_workload')
    df = df.join(team_avg, on='team')
    res = df[df['workload'] > df['avg_workload']][['employee_id', 'project_id']]
    return res.sort_values(['employee_id', 'project_id'])

Complexity

  • ⏰ Time complexity: O(n) where n is the number of project allocations.
  • 🧺 Space complexity: O(n) for storing intermediate results.