Problem

A gene string can be represented by an 8-character long string, with choices from 'A''C''G', and 'T'.

Suppose we need to investigate a mutation from a gene string startGene to a gene string endGene where one mutation is defined as one single character changed in the gene string.

  • For example, "AACCGGTT" --> "AACCGGTA" is one mutation.

There is also a gene bank bank that records all the valid gene mutations. A gene must be in bank to make it a valid gene string.

Given the two gene strings startGene and endGene and the gene bank bank, return the minimum number of mutations needed to mutate from startGene to endGene. If there is no such a mutation, return -1.

Note that the starting point is assumed to be valid, so it might not be included in the bank.

Examples

Example 1:

1
2
3
4
Input:
startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]
Output:
 1

Example 2:

1
2
3
4
Input:
startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]
Output:
 2

Solution

Method 1 – BFS for Minimum Mutation Steps

Intuition

The problem asks for the minimum number of mutations needed to change the start gene to the end gene, where each mutation changes one character and must result in a gene present in the bank. This is a classic shortest path problem in an unweighted graph, which can be solved using BFS.

Approach

  1. Treat each gene as a node and connect nodes if they differ by exactly one character and the mutated gene is in the bank.
  2. Use BFS starting from the start gene, trying all possible single-character mutations at each step.
  3. For each mutation, if it matches the end gene, return the number of steps.
  4. Use a set for the bank for O(1) lookup and a set for visited genes to avoid cycles.
  5. If the end gene is not reachable, return -1.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class Solution {
    public int minMutation(String start, String end, String[] bank) {
        Set<String> bset = new HashSet<>(Arrays.asList(bank));
        if (!bset.contains(end)) return -1;
        char[] genes = {'A','C','G','T'};
        Queue<String> q = new LinkedList<>();
        Set<String> vis = new HashSet<>();
        q.offer(start);
        vis.add(start);
        int steps = 0;
        while (!q.isEmpty()) {
            int sz = q.size();
            for (int i = 0; i < sz; i++) {
                String cur = q.poll();
                if (cur.equals(end)) return steps;
                char[] arr = cur.toCharArray();
                for (int j = 0; j < arr.length; j++) {
                    char old = arr[j];
                    for (char g : genes) {
                        if (g == old) continue;
                        arr[j] = g;
                        String next = new String(arr);
                        if (bset.contains(next) && !vis.contains(next)) {
                            vis.add(next);
                            q.offer(next);
                        }
                    }
                    arr[j] = old;
                }
            }
            steps++;
        }
        return -1;
    }
}

Complexity

  • ⏰ Time complexity: O(n * m * 4), where n is the number of genes in the bank and m is the length of the gene string (usually 8).
  • 🧺 Space complexity: O(n), for the bank and visited sets.

Method 2 – Bidirectional BFS

Intuition

Bidirectional BFS can speed up the search by simultaneously expanding from both the start and end genes, meeting in the middle.

Approach

  1. Use two sets to represent the frontiers from the start and end genes.
  2. At each step, expand the smaller frontier by mutating each gene and checking if it appears in the opposite frontier.
  3. Use a set for the bank and visited genes.
  4. If the frontiers meet, return the number of steps; otherwise, return -1 if no path exists.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class Solution {
    public int minMutation(String start, String end, String[] bank) {
        Set<String> bset = new HashSet<>(Arrays.asList(bank));
        if (!bset.contains(end)) return -1;
        char[] genes = {'A','C','G','T'};
        Set<String> begin = new HashSet<>();
        Set<String> endset = new HashSet<>();
        Set<String> vis = new HashSet<>();
        begin.add(start);
        endset.add(end);
        int steps = 0;
        while (!begin.isEmpty() && !endset.isEmpty()) {
            if (begin.size() > endset.size()) {
                Set<String> tmp = begin;
                begin = endset;
                endset = tmp;
            }
            Set<String> next = new HashSet<>();
            for (String cur : begin) {
                char[] arr = cur.toCharArray();
                for (int i = 0; i < arr.length; i++) {
                    char old = arr[i];
                    for (char g : genes) {
                        if (g == old) continue;
                        arr[i] = g;
                        String mut = new String(arr);
                        if (endset.contains(mut)) return steps + 1;
                        if (bset.contains(mut) && !vis.contains(mut)) {
                            vis.add(mut);
                            next.add(mut);
                        }
                    }
                    arr[i] = old;
                }
            }
            begin = next;
            steps++;
        }
        return -1;
    }
}

Complexity

  • ⏰ Time complexity: O(n * m * 4), but often faster in practice due to bidirectional search.
  • 🧺 Space complexity: O(n), for the sets.