Problem
Given a string paragraph
and a string array of the banned words banned
, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.
The words in paragraph
are case-insensitive and the answer should be returned in lowercase.
Examples
Example 1:
Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"
Explanation:
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph.
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"),
and that "hit" isn't the answer even though it occurs more because it is banned.
Example 2:
Input: paragraph = "a.", banned = []
Output: "a"
Solution
Method 1 - Using Set
- Normalize the Paragraph:
- Convert the paragraph to lowercase to handle case-insensitivity.
- Replace all punctuation marks with spaces.
- Split the paragraph into words using whitespace.
- Use a Set for Banned Words:
- Convert the list of banned words into a set for O(1) time complexity checks.
- Count Frequencies of Non-Banned Words:
- Use a dictionary (hashmap) to count the occurrences of each word that is not in the banned set.
- Find the Most Frequent Word:
- Iterate through the dictionary to find the word with the highest count.
Code
Java
public class Solution {
public String mostCommonWord(String paragraph, String[] banned) {
Set<String> bannedSet = new HashSet<>(Arrays.asList(banned));
Map<String, Integer> freq = new HashMap<>();
paragraph = paragraph.replaceAll("[^a-zA-Z]", " ").toLowerCase();
String[] words = paragraph.split("\\s+");
for (String word : words) {
if (!bannedSet.contains(word)) {
freq.put(word, freq.getOrDefault(word, 0) + 1);
}
}
String ans = "";
int maxCount = 0;
for (Map.Entry<String, Integer> entry : freq.entrySet()) {
if (entry.getValue() > maxCount) {
maxCount = entry.getValue();
ans = entry.getKey();
}
}
return ans;
}
}
Python
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
banned_set = set(banned)
words = re.findall(r'\w+', paragraph.lower())
freq = defaultdict(int)
for word in words:
if word not in banned_set:
freq[word] += 1
ans = ''
max_count = 0
for word, count in freq.items():
if count > max_count:
max_count = count
ans = word
return ans
Complexity
- ⏰ Time complexity:
O(n + m)
, wheren
is the length of the paragraph andm
is the number of banned words. This accounts for processing the paragraph, and for checking and counting word frequencies. - 🧺 Space complexity:
O(n + m)
for storing the words, their frequencies, and the set of banned words.