Find Candidates for Data Scientist Position II

Problem

Table: Candidates

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| candidate_id | int     |
| skill        | varchar |
| proficiency  | int     |
+--------------+---------+
(candidate_id, skill) is the unique key for this table.
Each row includes candidate_id, skill, and proficiency level (1-5).

Table: Projects

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| project_id   | int     |
| skill        | varchar |
| importance   | int     |
+--------------+---------+
(project_id, skill) is the primary key for this table.
Each row includes project_id, required skill, and its importance (1-5) for the project.

Leetcode is staffing for multiple data science projects. Write a solution to find the best candidate foreach project based on the following criteria:

Candidates must have all the skills required for a project.
Calculate a score for each candidate-project pair as follows:
- Start with 100 points
- Add 10 points for each skill where proficiency > importance
- Subtract 5 points for each skill where proficiency < importance
- If the candidate’s skill proficiency equal to the project’s skill importance, the score remains unchanged

Include only the top candidate (highest score) for each project. If there’s a tie , choose the candidate with the lower candidate_id. If there is no suitable candidate for a project, do not return that project.

Return a result table ordered by project_id in ascending order.

The result format is in the following example.

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


Input:
`Candidates` table:
+--------------+-----------+-------------+
| candidate_id | skill     | proficiency |
+--------------+-----------+-------------+
| 101          | Python    | 5           |
| 101          | Tableau   | 3           |
| 101          | PostgreSQL| 4           |
| 101          | TensorFlow| 2           |
| 102          | Python    | 4           |
| 102          | Tableau   | 5           |
| 102          | PostgreSQL| 4           |
| 102          | R         | 4           |
| 103          | Python    | 3           |
| 103          | Tableau   | 5           |
| 103          | PostgreSQL| 5           |
| 103          | Spark     | 4           |
+--------------+-----------+-------------+
`Projects` table:
+-------------+-----------+------------+
| project_id  | skill     | importance |
+-------------+-----------+------------+
| 501         | Python    | 4          |
| 501         | Tableau   | 3          |
| 501         | PostgreSQL| 5          |
| 502         | Python    | 3          |
| 502         | Tableau   | 4          |
| 502         | R         | 2          |
+-------------+-----------+------------+
Output:
+-------------+--------------+-------+
| project_id  | candidate_id | score |
+-------------+--------------+-------+
| 501         | 101          | 105   |
| 502         | 102          | 130   |
+-------------+--------------+-------+
Explanation:
* For Project 501, Candidate 101 has the highest score of 105. All other candidates have the same score but Candidate 101 has the lowest candidate_id among them.
* For Project 502, Candidate 102 has the highest score of 130.
The output table is ordered by project_id in ascending order.

Solution

Method 1 – Skill Matching and Scoring

Intuition

The key idea is to match each candidate to projects only if they possess all required skills. For each candidate-project pair, we calculate a score based on the difference between the candidate’s proficiency and the project’s skill importance. We then select the candidate with the highest score for each project, breaking ties by candidate_id.

Approach

For each project, list all required skills.
For each candidate, check if they have all the required skills for a project.
For each matching skill, calculate the score:
- Start with 100 points.
- For each skill:
  - Add 10 if proficiency > importance.
  - Subtract 5 if proficiency < importance.
  - No change if equal.
For each project, select the candidate with the highest score (lowest candidate_id in case of tie).
Return the results ordered by project_id.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


WITH ProjectSkills AS (
  SELECT project_id, COUNT(*) AS skill_count
  FROM Projects
  GROUP BY project_id
),
CandidateSkills AS (
  SELECT p.project_id, c.candidate_id,
         SUM(
           CASE
             WHEN c.proficiency > p.importance THEN 10
             WHEN c.proficiency < p.importance THEN -5
             ELSE 0
           END
         ) + 100 AS score,
         COUNT(*) AS matched_skills
  FROM Projects p
  JOIN Candidates c
    ON p.skill = c.skill
  GROUP BY p.project_id, c.candidate_id
)
SELECT cs.project_id, cs.candidate_id, cs.score
FROM CandidateSkills cs
JOIN ProjectSkills ps
  ON cs.project_id = ps.project_id
WHERE cs.matched_skills = ps.skill_count
  -- Only candidates with all required skills
  AND cs.score = (
    SELECT MAX(score)
    FROM CandidateSkills cs2
    WHERE cs2.project_id = cs.project_id
      AND cs2.matched_skills = ps.skill_count
  )
ORDER BY cs.project_id, cs.candidate_id
;

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


WITH ProjectSkills AS (
  SELECT project_id, COUNT(*) AS skill_count
  FROM Projects
  GROUP BY project_id
),
CandidateSkills AS (
  SELECT p.project_id, c.candidate_id,
         SUM(
           CASE
             WHEN c.proficiency > p.importance THEN 10
             WHEN c.proficiency < p.importance THEN -5
             ELSE 0
           END
         ) + 100 AS score,
         COUNT(*) AS matched_skills
  FROM Projects p
  JOIN Candidates c
    ON p.skill = c.skill
  GROUP BY p.project_id, c.candidate_id
)
SELECT cs.project_id, cs.candidate_id, cs.score
FROM CandidateSkills cs
JOIN ProjectSkills ps
  ON cs.project_id = ps.project_id
WHERE cs.matched_skills = ps.skill_count
  AND cs.score = (
    SELECT MAX(score)
    FROM CandidateSkills cs2
    WHERE cs2.project_id = cs.project_id
      AND cs2.matched_skills = ps.skill_count
  )
ORDER BY cs.project_id, cs.candidate_id
;

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


class Solution:
    def find_best_candidates(
        self,
        candidates: 'pd.DataFrame',
        projects: 'pd.DataFrame'
    ) -> 'pd.DataFrame':
        ans = []
        for pid, pgroup in projects.groupby('project_id'):
            required = pgroup[['skill', 'importance']].set_index('skill')
            best_score = float('-inf')
            best_cid = None
            for cid, cgroup in candidates.groupby('candidate_id'):
                cskills = cgroup.set_index('skill')['proficiency']
                if not required.index.isin(cskills.index).all():
                    continue
                score = 100
                for skill, imp in required['importance'].items():
                    prof = cskills[skill]
                    if prof > imp:
                        score += 10
                    elif prof < imp:
                        score -= 5
                if (score > best_score) or (score == best_score and (best_cid is None or cid < best_cid)):
                    best_score = score
                    best_cid = cid
            if best_cid is not None:
                ans.append({'project_id': pid, 'candidate_id': best_cid, 'score': best_score})
        return pd.DataFrame(ans).sort_values(['project_id', 'candidate_id'])

Complexity

⏰ Time complexity: O(P * C * S) Where P = number of projects, C = number of candidates, S = average number of skills per project. For each project, we check all candidates and their skills.
🧺 Space complexity: O(P + C) For storing intermediate groupings and the result.

Problem#

Example 1:#

Solution#

Method 1 – Skill Matching and Scoring#

Intuition#

Approach#

Code#

Complexity#