Department Highest Salary Problem

Problem

Table: Employee

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| id           | int     |
| name         | varchar |
| salary       | int     |
| departmentId | int     |
+--------------+---------+

id is the primary key column for this table. departmentId is a foreign key of the ID from the Department table. Each row of this table indicates the ID, name, and salary of an employee. It also contains the ID of their department.

Table: Department

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+

id is the primary key column for this table. It is guaranteed that department name is not NULL. Each row of this table indicates the ID of a department and its name.

Write an SQL query to find employees who have the highest salary in each of the departments.

Return the result table in any order.

The query result format is in the following example.

Examples

Example 1:

Input: Employee table:

+----+-------+--------+--------------+
| id | name  | salary | departmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Jim   | 90000  | 1            |
| 3  | Henry | 80000  | 2            |
| 4  | Sam   | 60000  | 2            |
| 5  | Max   | 90000  | 1            |
+----+-------+--------+--------------+

Department table:

+----+-------+
| id | name  |
+----+-------+
| 1  | IT    |
| 2  | Sales |
+----+-------+

Output:

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Jim      | 90000  |
| Sales      | Henry    | 80000  |
| IT         | Max      | 90000  |
+------------+----------+--------+

Explanation: Max and Jim both have the highest salary in the IT department and Henry has the highest salary in the Sales department.

Solution

Method 1 - Join and Subquery

Code

SQL

SELECT d.name AS Department ,e.name AS Employee, e.salary
FROM Department d JOIN Employee e ON e.departmentId=d.id 
WHERE(e.departmentId, e.salary) IN
(SELECT departmentId,MAX(salary) FROM Employee GROUP BY departmentId) ;

Pandas

Check if either the employee or department DataFrame is empty. If either of them is empty, return an empty DataFrame with the column names [‘Department’, ‘Employee’, ‘Salary’].

    if employee.empty or department.empty:
        return pd.DataFrame(columns=['Department','Employee', 'Salary'])

Merge the employee and department DataFrames on ‘departmentId’ and ‘id’ columns, respectively, using the merge function.

    merged_df = employee.merge(department, left_on='departmentId', right_on='id', suffixes=('_employee', '_department'))

Use the groupby function to group data in merged_df by ‘departmentId’ and apply a lambda function to find employees with the highest salary in each group.

    highest_salary_df = merged_df.groupby('departmentId').apply(lambda x: x[x['salary'] == x['salary'].max()])

Reset the index of highest_salary_df to remove the group labels and obtain a flat DataFrame.

    highest_salary_df = highest_salary_df.reset_index(drop=True)

Select the required columns ’name_department’, ’name_employee’, and ‘salary’ from highest_salary_df to get the department name, employee name, and salary of employees with the highest salary in each department.

    result_df = highest_salary_df[['name_department', 'name_employee', 'salary']]

Rename the columns of the resulting DataFrame to [‘Department’, ‘Employee’, ‘Salary’] as specified.

    result_df.columns = ['Department','Employee', 'Salary']

Return the resulting DataFrame result_df containing employees with the highest salary in each department.

import pandas as pd

def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
    if employee.empty or department.empty:
        return pd.DataFrame(columns=['Department','Employee', 'Salary'])
    
    merged_df = employee.merge(department, left_on='departmentId', right_on='id', suffixes=('_employee', '_department'))
    
    highest_salary_df = merged_df.groupby('departmentId').apply(lambda x: x[x['salary'] == x['salary'].max()])
    

    highest_salary_df = highest_salary_df.reset_index(drop=True)
    

    result_df = highest_salary_df[['name_department', 'name_employee', 'salary']]
    
    result_df.columns = ['Department','Employee', 'Salary']
    
    return result_df

Problem#

Examples#

Solution#

Method 1 - Join and Subquery#

Code#

SQL#

Pandas#