Find Candidates for Data Scientist Position
EasyUpdated: Aug 2, 2025
Practice on:
Problem
Table: Candidates
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| candidate_id | int |
| skill | varchar |
+--------------+---------+
(candidate_id, skill) is the primary key (columns with unique values) for this table.
Each row includes candidate_id and skill.
Write a query to find the candidates best suited for a Data Scientist position. The candidate must be proficient in Python , Tableau , and PostgreSQL.
Return the result table ordered bycandidate_id inascending order.
The result format is in the following example.
Examples
Example 1:
Input:
Candidates table:
+---------------+--------------+
| candidate_id | skill |
+---------------+--------------+
| 123 | Python |
| 234 | R |
| 123 | Tableau |
| 123 | PostgreSQL |
| 234 | PowerBI |
| 234 | SQL Server |
| 147 | Python |
| 147 | Tableau |
| 147 | Java |
| 147 | PostgreSQL |
| 256 | Tableau |
| 102 | DataAnalysis |
+---------------+--------------+
Output:
+--------------+
| candidate_id |
+--------------+
| 123 |
| 147 |
+--------------+
Explanation:
- Candidates 123 and 147 possess the necessary skills in Python, Tableau, and PostgreSQL for the data scientist position.
- Candidates 234 and 102 do not possess any of the required skills for this position.
- Candidate 256 has proficiency in Tableau but is missing skills in Python and PostgreSQL.
The output table is sorted by candidate_id in ascending order.
Solution
Method 1 – SQL Group By and Having
Intuition
We need to find candidates who have all three required skills: Python, Tableau, and PostgreSQL. We can group by candidate_id and count the number of required skills each candidate has, then select those with all three.
Approach
- Filter the Candidates table for rows where skill is one of ('Python', 'Tableau', 'PostgreSQL').
- Group by candidate_id.
- Count the number of distinct required skills for each candidate.
- Select candidate_ids where the count is 3.
- Order the result by candidate_id ascending.
Code
MySQL
SELECT candidate_id
FROM Candidates
WHERE skill IN ('Python', 'Tableau', 'PostgreSQL')
GROUP BY candidate_id
HAVING COUNT(DISTINCT skill) = 3
ORDER BY candidate_id;
PostgreSQL
SELECT candidate_id
FROM Candidates
WHERE skill IN ('Python', 'Tableau', 'PostgreSQL')
GROUP BY candidate_id
HAVING COUNT(DISTINCT skill) = 3
ORDER BY candidate_id;
Python (pandas)
def find_candidates(candidates: 'pd.DataFrame') -> 'pd.DataFrame':
required = {'Python', 'Tableau', 'PostgreSQL'}
filtered = candidates[candidates['skill'].isin(required)]
grouped = filtered.groupby('candidate_id')['skill'].nunique().reset_index()
result = grouped[grouped['skill'] == 3][['candidate_id']].sort_values('candidate_id')
return result
Complexity
- ⏰ Time complexity:
O(n), where n is the number of rows in Candidates. Each row is processed once. - 🧺 Space complexity:
O(n), for storing filtered and grouped data.