Problem
A valid encoding of an array of words
is any reference string s
and array of indices indices
such that:
words.length == indices.length
- The reference string
s
ends with the'#'
character. - For each index
indices[i]
, the substring ofs
starting fromindices[i]
and up to (but not including) the next'#'
character is equal towords[i]
.
Given an array of words
, return the length of the shortest reference string s
possible of any valid encoding of words
.
Examples
Example 1:
Input: words = ["time", "me", "bell"]
Output: 10
Explanation: A valid encoding would be s = "time#bell#" and indices = [0, 2, 5].
words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#"
words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#"
words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"
Example 2:
Input: words = ["t"]
Output: 2
Explanation: A valid encoding would be s = "t#" and indices = [0].
Solution
The problem has horrible description. Let’s get the deacription clear: Given string list L1, construct a another string list L2, making every word in L1 be a suffix of a word in L2. Return the minimum possible total length of words in L2
Input L1: [“time”,“me”,“bell”]
L2: [“time”,“bell”]
Method 1 - Sort Strings in Desc Order by Length and Generate All Suffices and Check Uniqueness by Set
Algo
- Create a tuple of each word and its length from the input list and store them in a new list called
nodeList
. - Sort
nodeList
in descending order based on the length of the words. - For each word in
nodeList
, generate all possible suffixes:- If a suffix does not exist in the set, add it to the set and increment the result length.
- If a suffix already exists, continue to the next word.
Code
Java
public int minimumLengthEncoding(String[] words) {
class Node {
String s;
int len;
Node(String s, int len) {
this.s = s;
this.len = len;
}
}
List<Node> list = new ArrayList<>();
for (String w : words) {
list.add(new Node(w, w.length()));
}
list.sort((o1, o2) -> Integer.compare(o2.len, o1.len));
Set<String> set = new HashSet<>();
int ans = 0;
for (Node node : list) {
String str = node.s;
if (!set.contains(str)) {
for (int i = 0, l = str.length(); i < l; i++) {
set.add(str.substring(i, l));
}
ans += (str.length() + 1);// +1 for `#`
}
}
return ans;
}
Python
class Solution:
def minimumLengthEncoding(self, words: List[str]) -> int:
# Create a list of tuples (word, length)
nodeList = sorted([(word, len(word)) for word in words], key=lambda x: -x[1])
seen = set()
ans = 0
for word, length in nodeList:
if word not in seen:
# Add all suffixes to the set
for i in range(length):
seen.add(word[i:])
# Increment the result length by word length + 1 (for '#')
ans += length + 1
return ans
Complexity
- ⏰ Time complexity:
O(n * m * log n)
, wheren
is the number of words andm
is the average length of words. Sorting dominates the time complexity. - 🧺 Space complexity:
O(n * m)
for storing all suffixes and the set.
Method 2 - Using Set and Remove Suffices 🏆
Algo
- Create a set containing all the words.
- For each word, remove all its suffixes from the set.
- The remaining words in the set constitute all the encoding words.
- Sum up the lengths of the words in the set, adding one for each to account for the
'#'
character, and return the total.
Code
Java
class Solution {
public int minimumLengthEncoding(String[] words) {
Set<String> set = new HashSet<>(Arrays.asList(words));
for (String w: words) {
for (int i = 1; i<w.length(); ++i) {
set.remove(w.substring(i));
}
}
int ans = 0;
for (String w: set) {
ans += w.length() + 1;
}
return ans;
}
}
Python
class Solution:
def minimumLengthEncoding(self, words: List[str]) -> int:
word_set = set(words)
for word in words:
for i in range(1, len(word)):
word_set.discard(word[i:])
return sum(len(word) + 1 for word in word_set)
Complexity
- ⏰ Time complexity:
O(n * m^2)
, wheren
is the number of words andm
is the maximum length of a word. This is because removing suffixes requires iterating over each word and its suffixes. - 🧺 Space complexity:
O(n * m)
, for storing the words and their suffixes in the set.
Method 3 - Using Trie 🏆
The concept of using a Trie is quite straightforward and becomes even clearer when dealing with suffixes. The approach involves inserting all reversed words into a Trie.
Code
Java
TrieNode:
class TrieNode {
Map<Character, TrieNode> children = new HashMap<>();
}
public boolean add(TrieNode root, String word){
boolean added = false;
for(int i = word.length() -1; i>=0; --i){
char c = word.charAt(i);
if(!root.children.containsKey(c)) {
added = true;
root.children.put(c, new TrieNode());
}
root = root.children.get(c);
}
return added;
}
Actual Code:
public int minimumLengthEncoding(String[] words) {
if(words == null || words.length < 1) return 0;
Arrays.sort(words, (a, b)->b.length() - a.length());
TrieNode root = new TrieNode();
int ans = 0 ;
for(String word : words){
if(add(root, word)){
ans+=word.length()+1;
}
}
return ans;
}
Python
TrieNode:
class TrieNode:
def __init__(self):
self.children = {}
def add(root: TrieNode, word: str) -> bool:
added = False
for char in reversed(word):
if char not in root.children:
added = True
root.children[char] = TrieNode()
root = root.children[char]
return added
Actual Code:
class Solution:
def minimumLengthEncoding(self, words: List[str]) -> int:
if not words:
return 0
words.sort(key=len, reverse=True)
root = TrieNode()
ans = 0
for word in words:
if add(root, word):
ans += len(word) + 1
return ans
Complexity
- ⏰ Time complexity:
O(n * m)
, wheren
is the number of words andm
is the maximum length of a word. Sorting takesO(n log n)
, and inserting into the Trie takesO(n * m)
. - 🧺 Space complexity:
O(n * m)
for storing all nodes in the Trie.