Design Search Autocomplete System
Problem
Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'). For each character they type except '#', you need to return the top 3 historical hot sentences that have prefix the same as the part of sentence already typed. Here are the specific rules:
- The hot degree for a sentence is defined as the number of times a user typed the exactly same sentence before.
- The returned top 3 hot sentences should be sorted by hot degree (The first is the hottest one). If several sentences have the same degree of hot, you need to use ASCII-code order (smaller one appears first).
- If less than 3 hot sentences exist, then just return as many as you can.
- When the input is a special character, it means the sentence ends, and in this case, you need to return an empty list.
Your job is to implement the following functions:
The constructor function:
AutocompleteSystem(String[] sentences, int[] times): This is the constructor. The input is historical data. Sentences is a string array consists of previously typed sentences. Times is the corresponding times a sentence has been typed. Your system should record these historical data.
Now, the user wants to input a new sentence. The following function will provide the next character the user types:
List<String> input(char c): The input c is the next character typed by the user. The character will only be lower-case letters ('a' to 'z'), blank space (' ') or a special character ('#'). Also, the previously typed sentence should be recorded in your system. The output will be the top 3 historical hot sentences that have prefix the same as the part of sentence already typed.
Examples
**Operation:** AutocompleteSystem(["i love you", "island","ironman", "i love leetcode"], [5,3,2,2])
The system have already tracked down the following sentences and their corresponding times:
`"i love you"` : `5` times
`"island"` : `3` times
`"ironman"` : `2` times
`"i love leetcode"` : `2` times
Now, the user begins another search:
**Operation:** input('i')
Output:
["i love you", "island","i love leetcode"]
Explanation:
There are four sentences that have prefix `"i"`. Among them, "ironman" and "i love leetcode" have same hot degree. Since `' '` has ASCII code 32 and `'r'` has ASCII code 114, "i love leetcode" should be in front of "ironman". Also we only need to output top 3 hot sentences, so "ironman" will be ignored.
**Operation:** input(' ')
Output:
["i love you","i love leetcode"]
Explanation:
There are only two sentences that have prefix `"i "`.
**Operation:** input('a')
Output:
[]
Explanation:
There are no sentences that have prefix `"i a"`.
**Operation:** input('#')
Output:
[]
Explanation:
The user finished the input, the sentence `"i a"` should be saved as a historical sentence in system. And the following input will be counted as a new search.
Note
- The input sentence will always start with a letter and end with '#', and only one blank space will exist between two words.
- The number of complete sentences that to be searched won't exceed 100. The length of each sentence including those in the historical data won't exceed 100.
- Please use double-quote instead of single-quote when you write test cases even for a character input.
- Please remember to RESET your class variables declared in class AutocompleteSystem, as static/class variables are persisted across multiple test cases. Please see here for more details.
Solution
Method 1 - Using Trie
Here is the approach:
-
TrieNode Class:
- Each node in the Trie contains:
- A
freqdictionary mapping a sentence to its frequency. - A
childrendictionary mapping characters to their respective Trie nodes. - An
is_endflag to mark the end of a sentence.
- A
- Each node in the Trie contains:
-
AutocompleteSystem Class:
- Initialise the Trie with the
sentencesand their correspondingtimesby calling_inserton each sentence. - Maintain a sentence buffer (
sb) and current Trie pointer (curr) during input processing.
- Initialise the Trie with the
-
Input Method:
- Append characters to the buffer while traversing or creating Trie nodes.
- When encountering
'#', store the completed sentence in the Trie with a frequency of1and reset the input state. - Fetch top 3 sentences matching the prefix using
_find_top_k.
-
Top-K Search:
- Use a min heap (
heapq) to keep track of the top 3 sentences based on both frequency and lexicographical order. - Sort and deliver results in descending order after popping from the heap.
- Use a min heap (
-
Reset:
- Reset the sentence buffer and Trie pointer after a completed sentence is saved.
-
Insert:
- Traverse the Trie while creating missing nodes and updating frequency maps for nodes along the sentence's path.
Code
Java
class AutocompleteSystem {
TrieNode root, curr;
StringBuffer sb;
public AutocompleteSystem(String[] sentences, int[] times) {
if (sentences == null || times == null || sentences.length != times.length) {
return;
}
reset();
root = new TrieNode();
for (int i = 0; i < times.length; i++) {
insert(sentences[i], times[i]);
}
}
public List<String> input(char c) {
List<String> result = new ArrayList<>();
if (curr == null) curr = root;
if (c == '#') { // save sentence and reset state
insert(sb.toString(), 1);
reset();
return result;
}
// Update global variable (curr TrieNode and string buffer); or append new character if not exist.
sb.append(c);
curr.children.putIfAbsent(c, new TrieNode());
curr = curr.children.get(c);
// MinHeap to find top 3.
result.addAll(findTopK(curr, 3));
return result;
}
private List<String> findTopK(TrieNode node, int k) {
List<String> result = new ArrayList<>();
if (node.freq.isEmpty()) {
return result;
}
PriorityQueue<Pair> queue = new PriorityQueue<>(
(a, b) -> a.count == b.count ? b.s.compareTo(a.s) : a.count - b.count);
for (Map.Entry<String, Integer> entry : node.freq.entrySet()) {
if (queue.size() < 3 || entry.getValue() >= queue.peek().count) {
queue.offer(new Pair(entry.getKey(), entry.getValue()));
}
if (queue.size() > 3) queue.poll();
}
while (!queue.isEmpty()) {
result.add(0, queue.poll().s);
}
return result;
}
private void reset() {
curr = null;
sb = new StringBuffer();
}
private void insert(String sentence, int count) {
if (sentence == null || sentence.length() == 0) {
return;
}
TrieNode node = root;
for (char c : sentence.toCharArray()) {
node.children.putIfAbsent(c, new TrieNode());
node = node.children.get(c);
node.freq.put(sentence, node.freq.getOrDefault(sentence, 0) + count);
}
node.isEnd = true; // can set word to node as well, if needed
}
}
class TrieNode {
public boolean isEnd;
public Map<String, Integer> freq;
public Map<Character, TrieNode> children; // Map is more applicable to all chars, not limited to 256 ASCII
public TrieNode() {
this.freq = new HashMap<>();
this.children = new HashMap<>();
}
}
class Pair {
String s;
int count;
public Pair(String s, int count) {
this.s = s;
this.count = count;
}
}
Python
class TrieNode:
def __init__(self):
self.is_end: bool = False
self.freq: Dict[str, int] = {}
self.children: Dict[str, TrieNode] = {}
class Pair:
def __init__(self, sentence: str, freq: int):
self.sentence: str = sentence
self.freq: int = freq
def __lt__(self, other):
# Compare frequencies first, then lexicographical order
if self.freq == other.freq:
return self.sentence > other.sentence
return self.freq < other.freq
class AutocompleteSystem:
def __init__(self, sentences: List[str], times: List[int]) -> None:
self.root = TrieNode()
self.sb = "" # string buffer to store the current sentence
self.curr = None # pointer to current TrieNode
if sentences and times and len(sentences) == len(times):
for i in range(len(sentences)):
self._insert(sentences[i], times[i])
def input(self, c: str) -> List[str]:
result = []
if not self.curr:
self.curr = self.root
if c == '#': # save sentence and reset state
self._insert(self.sb, 1)
self._reset()
return result
# Update global variables to maintain state
self.sb += c
if c not in self.curr.children:
self.curr.children[c] = TrieNode()
self.curr = self.curr.children[c]
# Find top K sentences using min-heap
result = self._find_top_k(self.curr, 3)
return result
def _find_top_k(self, node: TrieNode, k: int) -> List[str]:
result: List[str] = []
if not node.freq:
return result
queue: List[Pair] = []
for sentence, freq in node.freq.items():
heapq.heappush(queue, Pair(sentence, freq))
if len(queue) > k:
heapq.heappop(queue)
while queue:
result.insert(0, heapq.heappop(queue).sentence) # insert in reverse order
return result
def _reset(self) -> None:
self.sb = ""
self.curr = None
def _insert(self, sentence: str, count: int) -> None:
if not sentence:
return
node = self.root
for char in sentence:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.freq[sentence] = node.freq.get(sentence, 0) + count
node.is_end = True
Time Complexity
-
Insertion (
_insert):- Each sentence of length
mrequiresO(m)traversal of the Trie, with an additional update to the frequency map (search/update:O(log(k))forksentences). Hence, the complexity isO(m * log(k))
- Each sentence of length
-
Input Query (
input):- Traversing prefix length
presults inO(p). - Finding the top 3 sentences via
heapqrequiresO(k * log(k))forksentences. Hence, the complexity isO(p * log(k)).
- Traversing prefix length