Short Encoding of Words Problem

Problem

A valid encoding of an array of words is any reference string s and array of indices indices such that:

words.length == indices.length
The reference string s ends with the '#' character.
For each index indices[i], the substring of s starting from indices[i] and up to (but not including) the next '#' character is equal to words[i].

Given an array of words, return the length of the shortest reference string s possible of any valid encoding of words.

Examples

Example 1:

Input: words = ["time", "me", "bell"]
Output: 10
Explanation: A valid encoding would be s = "time#bell#" and indices = [0, 2, 5].
words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#"
words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#"
words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"

Example 2:

Input: words = ["t"]
Output: 2
Explanation: A valid encoding would be s = "t#" and indices = [0].

Solution

The problem has horrible description. Let’s get the deacription clear: Given string list L1, construct a another string list L2, making every word in L1 be a suffix of a word in L2. Return the minimum possible total length of words in L2

Input L1: [“time”,“me”,“bell”]
L2: [“time”,“bell”]

Method 1 - Sort Strings in Desc Order by Length and Generate All Suffices and Check Uniqueness by Set

Algo

Create a tuple of each word and its length from the input list and store them in a new list called nodeList.
Sort nodeList in descending order based on the length of the words.
For each word in nodeList, generate all possible suffixes:
- If a suffix does not exist in the set, add it to the set and increment the result length.
- If a suffix already exists, continue to the next word.

Code

Java

public int minimumLengthEncoding(String[] words) {
	class Node {
		String s;
		int len;
	
		Node(String s, int len) {
			this.s = s;
			this.len = len;
		}
	}
	List<Node> list = new ArrayList<>();
	for (String w : words) {
		list.add(new Node(w, w.length()));
	}
	list.sort((o1, o2) -> Integer.compare(o2.len, o1.len));

	Set<String> set = new HashSet<>();

	int ans = 0;
	for (Node node : list) {
		String str = node.s;
		if (!set.contains(str)) {
			for (int i = 0, l = str.length(); i < l; i++) {
				set.add(str.substring(i, l));
			}
			ans += (str.length() + 1);// +1 for `#`
		}
	}
	return ans;
}

Python

class Solution:
    def minimumLengthEncoding(self, words: List[str]) -> int:
        # Create a list of tuples (word, length)
        nodeList = sorted([(word, len(word)) for word in words], key=lambda x: -x[1])
        
        seen = set()
        ans = 0
        
        for word, length in nodeList:
            if word not in seen:
                # Add all suffixes to the set
                for i in range(length):
                    seen.add(word[i:])
                # Increment the result length by word length + 1 (for '#')
                ans += length + 1
        
        return ans

Complexity

⏰ Time complexity: O(n * m * log n), where n is the number of words and m is the average length of words. Sorting dominates the time complexity.
🧺 Space complexity: O(n * m) for storing all suffixes and the set.

Method 2 - Using Set and Remove Suffices 🏆

Algo

Create a set containing all the words.
For each word, remove all its suffixes from the set.
The remaining words in the set constitute all the encoding words.
Sum up the lengths of the words in the set, adding one for each to account for the '#' character, and return the total.

Code

Java

class Solution {
	public int minimumLengthEncoding(String[] words) {
		Set<String> set = new HashSet<>(Arrays.asList(words));
		for (String w: words) {
			for (int i = 1; i<w.length(); ++i) {
				set.remove(w.substring(i));
			}
		}
		int ans = 0;
		for (String w: set) {
			ans += w.length() + 1;
		}
		return ans;
	}
}

Python

class Solution:
    def minimumLengthEncoding(self, words: List[str]) -> int:
        word_set = set(words)
        
        for word in words:
            for i in range(1, len(word)):
                word_set.discard(word[i:])
        
        return sum(len(word) + 1 for word in word_set)

Complexity

⏰ Time complexity: O(n * m^2), where n is the number of words and m is the maximum length of a word. This is because removing suffixes requires iterating over each word and its suffixes.
🧺 Space complexity: O(n * m), for storing the words and their suffixes in the set.

Method 3 - Using Trie 🏆

The concept of using a Trie is quite straightforward and becomes even clearer when dealing with suffixes. The approach involves inserting all reversed words into a Trie.

Code

Java

TrieNode:

class TrieNode {
    Map<Character, TrieNode> children = new HashMap<>();
}

public boolean add(TrieNode root, String word){
	boolean added = false; 
	for(int i = word.length() -1; i>=0; --i){
		char c = word.charAt(i); 
		if(!root.children.containsKey(c)) {
			added = true;
			root.children.put(c, new TrieNode());
		}
		root = root.children.get(c);
	}
	return added; 
}

Actual Code:

public int minimumLengthEncoding(String[] words) {
	if(words == null || words.length < 1) return 0; 
	Arrays.sort(words, (a, b)->b.length() - a.length()); 
	TrieNode root = new TrieNode(); 
	int ans = 0 ; 
	for(String word : words){
		if(add(root, word)){
			ans+=word.length()+1; 
		}
	}
	return ans; 
}

Python

TrieNode:

class TrieNode:
    def __init__(self):
        self.children = {}

def add(root: TrieNode, word: str) -> bool:
    added = False
    for char in reversed(word):
        if char not in root.children:
            added = True
            root.children[char] = TrieNode()
        root = root.children[char]
    return added

Actual Code:

class Solution:
    def minimumLengthEncoding(self, words: List[str]) -> int:
        if not words:
            return 0
        
        words.sort(key=len, reverse=True)
        root = TrieNode()
        ans = 0
        
        for word in words:
            if add(root, word):
                ans += len(word) + 1
        
        return ans

Complexity

⏰ Time complexity: O(n * m) , where n is the number of words and m is the maximum length of a word. Sorting takes O(n log n), and inserting into the Trie takes O(n * m).
🧺 Space complexity: O(n * m) for storing all nodes in the Trie.

Problem#

Examples#

Solution#

Method 1 - Sort Strings in Desc Order by Length and Generate All Suffices and Check Uniqueness by Set#

Algo#

Code#

Java#

Python#

Complexity#

Method 2 - Using Set and Remove Suffices 🏆#

Algo#

Code#

Java#

Python#

Complexity#

Method 3 - Using Trie 🏆#

Code#

Java#

Python#

Complexity#

Problem

Examples

Solution

Method 1 - Sort Strings in Desc Order by Length and Generate All Suffices and Check Uniqueness by Set

Algo

Code

Java

Python

Complexity

Method 2 - Using Set and Remove Suffices 🏆

Algo

Code

Java

Python

Complexity

Method 3 - Using Trie 🏆

Code

Java

Python

Complexity