Problem

Table: Employees

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| employee_id | int |
| name | varchar |
+-------------+---------+
employee_id is the column with unique values for this table.
Each row of this table indicates the name of the employee whose ID is employee_id.

Table: `Salaries`

+-------------+---------+
| Column Name | Type |
+-------------+---------+
| employee_id | int |
| salary | int |
+-------------+---------+
employee_id is the column with unique values for this table.
Each row of this table indicates the salary of the employee whose ID is employee_id.

Write a solution to report the IDs of all the employees with missing information. The information of an employee is missing if:

  • The employee’s name is missing, or
  • The employee’s salary is missing.

Return the result table ordered by employee_id in ascending order.

The result format is in the following example.

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Input: 
Employees table:
+-------------+----------+
| employee_id | name     |
+-------------+----------+
| 2           | Crew     |
| 4           | Haven    |
| 5           | Kristian |
+-------------+----------+
Salaries table:
+-------------+--------+
| employee_id | salary |
+-------------+--------+
| 5           | 76071  |
| 1           | 22517  |
| 4           | 63539  |
+-------------+--------+
Output: 
+-------------+
| employee_id |
+-------------+
| 1           |
| 2           |
+-------------+
Explanation: 
Employees 1, 2, 4, and 5 are working at this company.
The name of employee 1 is missing.
The salary of employee 2 is missing.

Solution

Method 1 – Full Outer Join and Null Check (SQL, Pandas)

Intuition

An employee has missing information if their name or salary is missing. This can be found by checking for employees present in one table but not the other, or with null values after a full outer join.

Approach

  1. Perform a full outer join (or union of left and right joins) between Employees and Salaries on employee_id.
  2. Select rows where either name or salary is null.
  3. Return the employee_id in ascending order.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
SELECT employee_id
FROM Employees
WHERE name IS NULL
UNION
SELECT employee_id
FROM Salaries
WHERE salary IS NULL
UNION
SELECT e.employee_id
FROM Employees e
LEFT JOIN Salaries s ON e.employee_id = s.employee_id
WHERE s.salary IS NULL
UNION
SELECT s.employee_id
FROM Salaries s
LEFT JOIN Employees e ON s.employee_id = e.employee_id
WHERE e.name IS NULL
ORDER BY employee_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
SELECT employee_id
FROM Employees
WHERE name IS NULL
UNION
SELECT employee_id
FROM Salaries
WHERE salary IS NULL
UNION
SELECT e.employee_id
FROM Employees e
LEFT JOIN Salaries s ON e.employee_id = s.employee_id
WHERE s.salary IS NULL
UNION
SELECT s.employee_id
FROM Salaries s
LEFT JOIN Employees e ON s.employee_id = e.employee_id
WHERE e.name IS NULL
ORDER BY employee_id;
1
2
3
4
5
6
import pandas as pd

def employees_with_missing_information(employees: pd.DataFrame, salaries: pd.DataFrame) -> pd.DataFrame:
    df = pd.merge(employees, salaries, on='employee_id', how='outer')
    res = df[df['name'].isnull() | df['salary'].isnull()][['employee_id']]
    return res.sort_values('employee_id')

Complexity

  • ⏰ Time complexity: O(n + m) where n and m are the number of rows in Employees and Salaries.
  • 🧺 Space complexity: O(n + m) for the join result.