Problem

A string s is called good if there are no two different characters in s that have the same frequency.

Given a string s, return the minimum number of characters you need to delete to make s good.

The frequency of a character in a string is the number of times it appears in the string. For example, in the string "aab", the frequency of 'a' is 2, while the frequency of 'b' is 1.

Examples

Example 1:

1
2
3
Input: s = "aab"
Output: 0
Explanation: `s` is already good.

Example 2:

1
2
3
4
Input: s = "aaabbbcc"
Output: 2
Explanation: You can delete two 'b's resulting in the good string "aaabcc".
Another way it to delete one 'b' and one 'c' resulting in the good string "aaabbc".

Example 3:

1
2
3
4
Input: s = "ceabaacb"
Output: 2
Explanation: You can delete both 'c's resulting in the good string "eabaab".
Note that we only care about characters that are still in the string at the end (i.e. frequency of 0 is ignored).

Constraints

  • 1 <= s.length <= 10^5
  • s contains only lowercase English letters.

Solution

Method 1 - Using Frequency array and Set

A greedy approach is effective here because we are limited to deleting characters, not adding or replacing them.

First, count the frequency of each character. Then, for each of the 26 letters, if its frequency has already been used, keep deleting characters until you find an unused frequency or reach zero.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
public int minDeletions(String s) {
	int cnt[] = new int[26], ans = 0;
	Set<Integer> used = new HashSet<>();
	for (int i = 0; i<s.length(); ++i) {
		++cnt[s.charAt(i) - 'a'];
	}
	for (int i = 0; i<26; ++i) {
		while (cnt[i] > 0 && !used.add(cnt[i])) {
			--cnt[i];
			++ans;
		}
	}
	return ans;
}

Complexity

  • ⏰ Time complexity: O(n + k^2) = O(n). where n is length of the string and k is number of unique characters (at most 26 for lowercase English letters)
    • Counting character frequencies takes O(n).
    • For each of the 26 characters, we may decrement the frequency until it becomes unique, but since the sum of all frequencies is n and each decrement is counted toward the answer, the total number of decrements is at most n.
    • The set operations (add/check) are O(1) on average for each frequency.
  • 🧺 Space complexity: O(1), since the frequency array and set size are bounded by 26, the number of lowercase letters

Method 2 - Using Frequency array and Max Heap

Build a frequency map as before and insert all nonzero frequencies into a max heap. If the top two frequencies are equal, increment the answer and decrease one of them; otherwise, continue as normal.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public int minDeletions(String s) {
	int cnt[] = new int[26], ans = 0;
	
	for (int i = 0; i<s.length(); ++i) {
		++cnt[s.charAt(i) - 'a'];
	}
	// maxHeap
	PriorityQueue<Integer> pq = new PriorityQueue<>(Collections.reverseOrder());
	for(int freq: cnt) {
		// we shouldn't add 0 value freq
		if(freq != 0) {
			pq.add(freq);
		}
	}
	 while (pq.size() > 0) {
		int mostFreq = pq.remove();
		if (pq.size() == 0 ) {
			return ans;
		}
	
		if (mostFreq == pq.peek()) {
			if (mostFreq > 1) {
				pq.add(mostFreq - 1);
			}
			ans++;
		}
	}
	return ans;
}

Complexity

  • ⏰ Time complexity: O(n + klogk) = O(n).
    • Counting character frequencies takes O(n), where n is the length of the string.
    • Building the max heap with up to k elements (k ≤ 26 for lowercase letters) takes O(k).
    • Each heap operation (add/remove) is O(log k), and in the worst case, we may perform up to O(n) operations (since each deletion reduces the frequency).
    • Since k is at most 26, O(k log k) is effectively O(1), so the dominant term is O(n).
  • 🧺 Space complexity: O(1). The frequency array and the heap both use O(k) space, where k ≤ 26 (number of lowercase letters), so this is O(1) in practice.