Count the Number of Good Subsequences
Problem
A subsequence of a string is good if it is not empty and the frequency of each one of its characters is the same.
Given a string s, return the number of good subsequences of s. Since the answer may be too large, return it modulo 109 + 7.
A subsequence is a string that can be derived from another string by deleting some or no characters without changing the order of the remaining characters.
Examples
Example 1:
Input: s = "aabb"
Output: 11
Explanation: The total number of subsequences is 24. There are five subsequences which are not good: "**_aab_** b", "a _**abb**_ ", "**_a_** a _**bb**_ ", "_**aa**_ b** _b_** ", and the empty subsequence. Hence, the number of good subsequences is 24-5 = 11.
Example 2:
Input: s = "leet"
Output: 12
Explanation: There are four subsequences which are not good: "**_l_ _ee_** t", "l _**eet**_ ", "**_leet_** ", and the empty subsequence. Hence, the number of good subsequences is 24-4 = 12.
Example 3:
Input: s = "abcd"
Output: 15
Explanation: All of the non-empty subsequences are good subsequences. Hence, the number of good subsequences is 24-1 = 15.
Constraints:
1 <= s.length <= 10^4sconsists of only lowercase English letters.
Solution
Method 1 – Combinatorics with Inclusion-Exclusion
Intuition
A good subsequence is one where all characters appear the same number of times. For each subset of unique characters, we can choose a frequency f (from 1 up to the minimum count among the chosen characters), and count the number of ways to pick f occurrences for each character. The answer is the sum over all non-empty subsets of unique characters and all possible f.
Approach
- Count the frequency of each character in s.
- For every non-empty subset of unique characters:
- Let the subset be of size k.
- For each possible frequency f (from 1 up to the minimum count among the chosen characters):
- The number of ways to pick f occurrences for each character is the product of C(cnt[c], f) for all c in the subset.
- Sum over all f for this subset.
- Sum over all non-empty subsets.
- Return the answer modulo 10^9+7.
Code
C++
class Solution {
public:
int countGoodSubsequences(string s) {
const int MOD = 1e9+7;
vector<int> cnt(26, 0);
for (char c : s) cnt[c-'a']++;
vector<int> chars;
for (int i = 0; i < 26; ++i) if (cnt[i]) chars.push_back(cnt[i]);
int m = chars.size();
vector<vector<int>> C(10001, vector<int>(m+1, 1));
for (int i = 1; i <= 10000; ++i)
for (int j = 1; j <= m; ++j)
C[i][j] = (1LL * C[i-1][j] + C[i-1][j-1]) % MOD;
int ans = 0;
for (int mask = 1; mask < (1<<m); ++mask) {
int minf = 10000;
for (int i = 0; i < m; ++i) if (mask & (1<<i)) minf = min(minf, chars[i]);
for (int f = 1; f <= minf; ++f) {
int prod = 1;
for (int i = 0; i < m; ++i) if (mask & (1<<i)) prod = 1LL * prod * C[chars[i]][f] % MOD;
ans = (ans + prod) % MOD;
}
}
return ans;
}
};
Python
class Solution:
def countGoodSubsequences(self, s: str) -> int:
MOD = 10**9 + 7
from math import comb
from collections import Counter
cnt = Counter(s)
chars = list(cnt.values())
m = len(chars)
ans = 0
for mask in range(1, 1<<m):
minf = min(chars[i] for i in range(m) if (mask>>i)&1)
for f in range(1, minf+1):
prod = 1
for i in range(m):
if (mask>>i)&1:
prod = prod * comb(chars[i], f) % MOD
ans = (ans + prod) % MOD
return ans
Complexity
- ⏰ Time complexity:
O(2^k * k * n), where k is the number of unique characters and n is the length of s. - 🧺 Space complexity:
O(k)for character counts.