Problem

Table: Candidates

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| candidate_id | int     |
| skill        | varchar |
+--------------+---------+
(candidate_id, skill) is the primary key (columns with unique values) for this table.
Each row includes candidate_id and skill.

Write a query to find the candidates best suited for a Data Scientist position. The candidate must be proficient in Python , Tableau , and PostgreSQL.

Return the result table ordered bycandidate_id inascending order.

The result format is in the following example.

Examples

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Input: 
Candidates table:
+---------------+--------------+
| candidate_id  | skill        | 
+---------------+--------------+
| 123           | Python       |
| 234           | R            | 
| 123           | Tableau      | 
| 123           | PostgreSQL   | 
| 234           | PowerBI      | 
| 234           | SQL Server   | 
| 147           | Python       | 
| 147           | Tableau      | 
| 147           | Java         |
| 147           | PostgreSQL   |
| 256           | Tableau      |
| 102           | DataAnalysis |
+---------------+--------------+
Output: 
+--------------+
| candidate_id |  
+--------------+
| 123          |  
| 147          | 
+--------------+
Explanation: 
- Candidates 123 and 147 possess the necessary skills in Python, Tableau, and PostgreSQL for the data scientist position.
- Candidates 234 and 102 do not possess any of the required skills for this position.
- Candidate 256 has proficiency in Tableau but is missing skills in Python and PostgreSQL.
The output table is sorted by candidate_id in ascending order.

Solution

Method 1 – SQL Group By and Having

Intuition

We need to find candidates who have all three required skills: Python, Tableau, and PostgreSQL. We can group by candidate_id and count the number of required skills each candidate has, then select those with all three.

Approach

  1. Filter the Candidates table for rows where skill is one of (‘Python’, ‘Tableau’, ‘PostgreSQL’).
  2. Group by candidate_id.
  3. Count the number of distinct required skills for each candidate.
  4. Select candidate_ids where the count is 3.
  5. Order the result by candidate_id ascending.

Code

1
2
3
4
5
6
SELECT candidate_id
FROM Candidates
WHERE skill IN ('Python', 'Tableau', 'PostgreSQL')
GROUP BY candidate_id
HAVING COUNT(DISTINCT skill) = 3
ORDER BY candidate_id;
1
2
3
4
5
6
SELECT candidate_id
FROM Candidates
WHERE skill IN ('Python', 'Tableau', 'PostgreSQL')
GROUP BY candidate_id
HAVING COUNT(DISTINCT skill) = 3
ORDER BY candidate_id;
1
2
3
4
5
6
def find_candidates(candidates: 'pd.DataFrame') -> 'pd.DataFrame':
    required = {'Python', 'Tableau', 'PostgreSQL'}
    filtered = candidates[candidates['skill'].isin(required)]
    grouped = filtered.groupby('candidate_id')['skill'].nunique().reset_index()
    result = grouped[grouped['skill'] == 3][['candidate_id']].sort_values('candidate_id')
    return result

Complexity

  • ⏰ Time complexity: O(n), where n is the number of rows in Candidates. Each row is processed once.
  • 🧺 Space complexity: O(n), for storing filtered and grouped data.