Problem
A gene string can be represented by an 8-character long string, with choices from 'A'
, 'C'
, 'G'
, and 'T'
.
Suppose we need to investigate a mutation from a gene string startGene
to a gene string endGene
where one mutation is defined as one single character changed in the gene string.
- For example,
"AACCGGTT" --> "AACCGGTA"
is one mutation.
There is also a gene bank bank
that records all the valid gene mutations. A gene must be in bank
to make it a valid gene string.
Given the two gene strings startGene
and endGene
and the gene bank bank
, return the minimum number of mutations needed to mutate from startGene
to endGene
. If there is no such a mutation, return -1
.
Note that the starting point is assumed to be valid, so it might not be included in the bank.
Examples
Example 1:
|
|
Example 2:
|
|
Solution
Method 1 – BFS for Minimum Mutation Steps
Intuition
The problem asks for the minimum number of mutations needed to change the start gene to the end gene, where each mutation changes one character and must result in a gene present in the bank. This is a classic shortest path problem in an unweighted graph, which can be solved using BFS.
Approach
- Treat each gene as a node and connect nodes if they differ by exactly one character and the mutated gene is in the bank.
- Use BFS starting from the start gene, trying all possible single-character mutations at each step.
- For each mutation, if it matches the end gene, return the number of steps.
- Use a set for the bank for O(1) lookup and a set for visited genes to avoid cycles.
- If the end gene is not reachable, return -1.
Code
|
|
Complexity
- ⏰ Time complexity:
O(n * m * 4)
, wheren
is the number of genes in the bank andm
is the length of the gene string (usually 8). - 🧺 Space complexity:
O(n)
, for the bank and visited sets.
Method 2 – Bidirectional BFS
Intuition
Bidirectional BFS can speed up the search by simultaneously expanding from both the start and end genes, meeting in the middle.
Approach
- Use two sets to represent the frontiers from the start and end genes.
- At each step, expand the smaller frontier by mutating each gene and checking if it appears in the opposite frontier.
- Use a set for the bank and visited genes.
- If the frontiers meet, return the number of steps; otherwise, return -1 if no path exists.
Code
|
|
Complexity
- ⏰ Time complexity:
O(n * m * 4)
, but often faster in practice due to bidirectional search. - 🧺 Space complexity:
O(n)
, for the sets.