Minimum Number of Deletions Of a String

Problem

Given a dictionary and a word, find the minimum number of deletions needed on the word in order to make it a valid word in the dictionary.

For example:

The string “catn” needs one deletion to make it a valid word “cat” in the dictionary.
The string “bcatn” needs two deletions.

Constraints:

The dictionary is a list of valid words (strings).
The word consists of lowercase English letters.

Examples

Example 1:

Input: word = “catn”, dictionary = [“cat”, “bat”, “rat”] Output: 1 Explanation: Delete ’n’ to get “cat”.

Example 2:

Input: word = “bcatn”, dictionary = [“cat”, “bat”, “rat”] Output: 2 Explanation: Delete ‘b’ and ’n’ to get “cat”.

Solution

Method 1 – Traverse Dictionary (Subsequence Check)

Intuition: For each word in the dictionary, check if it is a subsequence of the input word. The minimum number of deletions is the difference in length.

Approach:

For each word in the dictionary:
- Check if it is a subsequence of the input word.
- If yes, compute deletions as len(word) - len(dict_word).
Return the minimum deletions found.

Code

1
2
3
4
5
6
7
8
9


def min_deletions(word, dictionary):
    def is_subsequence(s, t):
        it = iter(s)
        return all(c in it for c in t)
    min_del = float('inf')
    for dict_word in dictionary:
        if is_subsequence(word, dict_word):
            min_del = min(min_del, len(word) - len(dict_word))
    return min_del if min_del != float('inf') else -1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


#include <vector>
#include <string>
#include <algorithm>
using namespace std;
bool isSubsequence(const string& s, const string& t) {
    int i = 0, j = 0;
    while (i < s.size() && j < t.size()) {
        if (s[i] == t[j]) ++j;
        ++i;
    }
    return j == t.size();
}
int minDeletions(const string& word, const vector<string>& dictionary) {
    int minDel = word.size() + 1;
    for (const auto& dictWord : dictionary) {
        if (isSubsequence(word, dictWord)) {
            minDel = min(minDel, (int)word.size() - (int)dictWord.size());
        }
    }
    return minDel == word.size() + 1 ? -1 : minDel;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


public int minDeletions(String word, List<String> dictionary) {
    int minDel = Integer.MAX_VALUE;
    for (String dictWord : dictionary) {
        if (isSubsequence(word, dictWord)) {
            minDel = Math.min(minDel, word.length() - dictWord.length());
        }
    }
    return minDel == Integer.MAX_VALUE ? -1 : minDel;
}
private boolean isSubsequence(String s, String t) {
    int i = 0, j = 0;
    while (i < s.length() && j < t.length()) {
        if (s.charAt(i) == t.charAt(j)) j++;
        i++;
    }
    return j == t.length();
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


func isSubsequence(s, t string) bool {
    i, j := 0, 0
    for i < len(s) && j < len(t) {
        if s[i] == t[j] {
            j++
        }
        i++
    }
    return j == len(t)
}
func minDeletions(word string, dictionary []string) int {
    minDel := len(word) + 1
    for _, dictWord := range dictionary {
        if isSubsequence(word, dictWord) {
            if len(word)-len(dictWord) < minDel {
                minDel = len(word) - len(dictWord)
            }
        }
    }
    if minDel == len(word)+1 {
        return -1
    }
    return minDel
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


fun minDeletions(word: String, dictionary: List<String>): Int {
    fun isSubsequence(s: String, t: String): Boolean {
        var i = 0; var j = 0
        while (i < s.length && j < t.length) {
            if (s[i] == t[j]) j++
            i++
        }
        return j == t.length
    }
    var minDel = Int.MAX_VALUE
    for (dictWord in dictionary) {
        if (isSubsequence(word, dictWord)) {
            minDel = minOf(minDel, word.length - dictWord.length)
        }
    }
    return if (minDel == Int.MAX_VALUE) -1 else minDel
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


pub fn min_deletions(word: &str, dictionary: &[&str]) -> i32 {
    fn is_subsequence(s: &str, t: &str) -> bool {
        let (mut i, mut j) = (0, 0);
        let s_bytes = s.as_bytes();
        let t_bytes = t.as_bytes();
        while i < s_bytes.len() && j < t_bytes.len() {
            if s_bytes[i] == t_bytes[j] { j += 1; }
            i += 1;
        }
        j == t_bytes.len()
    }
    let mut min_del = word.len() + 1;
    for &dict_word in dictionary {
        if is_subsequence(word, dict_word) {
            min_del = min_del.min(word.len() - dict_word.len());
        }
    }
    if min_del == word.len() + 1 { -1 } else { min_del as i32 }
}

Complexity

Time: O(MN) where M = number of words in dictionary, N = length of input word
Space: O(1)

Method 2 – Generate All Deletions (DFS, Exponential)

Intuition: Try all possible deletions recursively and check if the resulting word is in the dictionary.

Approach:

Use DFS to generate all possible strings by deleting characters.
If a generated string is in the dictionary, track the number of deletions.
Return the minimum deletions found.

Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


def min_deletions_dfs(word, dictionary):
    dictionary = set(dictionary)
    memo = {}
    def dfs(s):
        if s in memo:
            return memo[s]
        if s in dictionary:
            return len(word) - len(s)
        min_del = float('inf')
        for i in range(len(s)):
            t = s[:i] + s[i+1:]
            min_del = min(min_del, dfs(t))
        memo[s] = min_del
        return min_del
    res = dfs(word)
    return res if res != float('inf') else -1

Complexity

Time: O(2^N)
Space: O(2^N)

Dictionary

Given that dictionary is so common in coding interview questions that I’d like to briefly summarize few strategies/techniques here.

To store a dictionary, usually people will use data structures including Hash set, Trie or maybe just array. You’d better understand pros and cons of each of them.
You may choose to have a pre-processing step to read the whole dictionary and store into your preferred data structure. Since once it’s loaded, you can use it as many times as you want.
If the dictionary is not too large, you may take the dictionary traverse time as a constant.

Traverse dictionary

Coming back to this problem, if we assume the dictionary can be traversed quickly (not too many entries), one approach is to go through each word in the dictionary, calculate the number deletions required, and return the minimum one. To calculate the number of deletions efficiently, we’ll use the common technique here. One fact is that if a longer string can be transformed to a shorter one by deleting characters, the longer string must contain all the characters of the smaller one in order. If you have noticed this fact, then you should know that we only need to traverse the two strings once in order to get the deletion number. More specifically, we put two indices (L for the longer string, S for the shorter string) pointing to beginning of each string. If the two characters under the indices are different, move L forward by one character. If the two characters are same, move both forward. If S comes to the end, it means the longer string contains all the characters in order, so the number of deletion needed is just len(longer) – len(shorter). Assuming the size of the dictionary is M and length of the given word is N, the time complexity is O(MN) because for each word in the dictionary, we may need to iterate over the given word.

Traverse the word

What if the dictionary is really large? Actually we can solve this problem from the other side – traverse all the possible words generated from deletion of the given word. So for the given word, we try to delete each of the characters and check if the new word exists in the dictionary. Since we need to quickly check the existence of a work in dictionary, we need to load the dictionary into a hash set. So the time complexity for pre-processing is O(M) (traverse the whole dictionary once) and for the rest of the algorithm is O(2^N) because we need to get all the possible words generated from the given word. It’s also worth to note that once the dictionary is loaded, we don’t need to do the pre-processing again and that’s why sometimes we can ignore the time spent here. So which solution is better? It depends on the size of the dictionary and length of the given word.

Takeaways

To sum up some techniques in this question:

You should be aware of common data structures for dictionary and pros and cons of each of them.
Given that the size of the dictionary is fixed, it’s not a bad idea to just iterate over it.
Having two indices to traverse/compare two string/arrays is quite common. For example, we use the same approach to merging two sorted arrays.

You may notice that it’s not easy to write the code for “traverse the word” solution. So please try to finish the code for this part.

Problem#

Examples#

Solution#

Method 1 – Traverse Dictionary (Subsequence Check)#

Code#

Complexity#

Method 2 – Generate All Deletions (DFS, Exponential)#

Python#

Complexity#

Dictionary#

Traverse dictionary#

Traverse the word#

Takeaways#

Problem

Examples

Solution

Method 1 – Traverse Dictionary (Subsequence Check)

Code

Complexity

Method 2 – Generate All Deletions (DFS, Exponential)

Python

Complexity

Dictionary

Traverse dictionary

Traverse the word

Takeaways