Problem
Given a document (or stream) of words. Find the top k most frequent words in the document (or stream).
OR
Given an array of strings words
and an integer k
, return the k
most frequent strings.
Return the answer sorted by the frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.
Examples
Example 1:
Input:
words = ["i","love","leetcode","i","love","coding"], k = 2
Output:
["i","love"]
Explanation: "i" and "love" are the two most frequent words.
Note that "i" comes before "love" due to a lower alphabetical order.
Example 2:
Input:
words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4
Output:
["the","is","sunny","day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.
Solution
Method 1 - Basic Solution Using Hashtable
Here is the basic solution:
- One quick solution would be to create a pair object with the word and its frequency. This takes O(n) time.
- then sort the pair array with respect to the frequency of the pair. This takes O(nlgn).
- Now, take the first k pairs from the sorted array of pairs. This takes O(k) time.
Complexity
the total time is O(n+nlg(n)+K) = O(n lgn).
Method 2 - Using MaxHeap (storing All the elements)
Here is the solution:
- build a heap of (word, word-frequency) pair with “word-frequency” as key. It takes O(n) time to build a heap;
- extract top K words from the heap. Each extraction is O(lg(n)). So, total time is
O(k*lg(n))
.
Complexity
- Time Complexity -
O(n+k*lg(n)
- Space Complexity -
O(n)
Method 3 - Using MinHeap with K Elements- O(nlgk) Solution with O(n) Space
But we can improve this solution. Note that we are only concern about the top k elements. Sorting the array means we are sorting all the n elements which is unnecessary as we are only concerned for first k. Any idea popping in? Yeah, I am sure you have the same feeling that we could use a Min Heap of size k to keep top k most frequent words. That’s right. We also need to use a hashmap to keep frequency of each word.
- Calculated frequency of all the words in a hashmap from the word to its frequency.
- Start adding pair object of word and its frequency into a min heap where we use the frequency as the key for the min heap.
- If the heap is full then remove the minimum element (top) form the heap and add add the new word-frequency pair only if the frequency of this word has frequency greater than the top word in the heap.
- Once we scanned all the words in the map and the heap is properly updated then the elements contained in the min heap are the top k most frequents.
This code is similar to Kth Largest Element in an Array.
Code
Java
Below is a simple implementation of the above idea using Java Priority Queue or creating our own heap.
Minheap with WordFreq implementing Comparable
public String[] topKWords(String[] words, int k) {
final class WordFreq implements Comparable<WordFreq> {
String word;
int freq;
public WordFreq(final String w, final int c) {
word = w;
freq = c;
}
@Override
public int compareTo(final WordFreq other) {
return Integer.compare(this.freq, other.freq);
}
}
final Map<String, Integer> frequencyMap = new HashMap<String, Integer>();
for (final String word: words) {
frequencyMap.put(word, frequencyMap.getOrDefault(word, 0) + 1);
}
// Build the topK heap
final PriorityQueue<WordFreq> minHeap = new PriorityQueue<WordFreq> (k);
for (final java.util.Map.Entry<String, Integer> entry: frequencyMap.entrySet()) {
if (minHeap.size()<k) {
minHeap.add(new WordFreq(entry.getKey(), entry.getValue()));
} else if (entry.getValue() > topKHeap.peek().freq) {
minHeap.remove();
minHeap.add(new WordFreq(entry.getKey(), entry.getValue()));
}
}
LinkedList<String> ans = new LinkedList<>(); // nOt using arraylist as it doesnt provide addFirst, otherwise we have to do collection reverse
while (!minHeap.isEmpty()) {
ans.addFirst(minHeap.poll().word);
}
return ans;
}
Minheap with WordFreq adding Comparator in PriorityQueue
public List<String> topKFrequent(String[] words, int k) {
Map<String, Integer> map = new HashMap<>();
for (String w : words) {
map.put(w, map.getOrDefault(w, 0) + 1);
}
Queue<Pair> minHeap = new PriorityQueue<>((a, b) -> (a.freq == b.freq) ? b.word.compareTo(a.word) : Integer.compare(a.freq, b.freq));
for (String w : map.keySet()) {
int f = map.get(w);
minHeap.offer(new Pair(w, f));
if (minHeap.size() > k) {
minHeap.poll();
}
}
LinkedList<String> result = new LinkedList<>(); // nOt using arraylist as it doesnt provide addFirst, otherwise we have to do collection reverse
while (!minHeap.isEmpty()) {
result.addFirst(minHeap.poll().word);
}
return result;
}
static class WordFreq {
String word;
int freq;
Pair(String word, int freq) {
this.word = word;
this.freq = freq;
}
}
Complexity
- Time Complexity - O(nlgk) time
- Space Complexity - O(n) space.
Method 4 - Use Bucketsort
We can apply bucketsort on the frequency.