Problem
Given two strings s
and p
, return an array of all the start indices of p
’s anagrams in s
. You may return the answer in any order.
An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.
Examples
Example 1:
Input: s = "cbaebabacd", p = "abc"
Output: [0,6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".
Example 2:
Input: s = "abab", p = "ab"
Output: [0,1,2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".
Solution
This problem differs from standard pattern searching because we also need to find anagrams. Hence, we cannot use traditional pattern searching algorithms such as Knuth Morris Pratt KMP Algorithm, Rabin-Karp Algorithm Explained, Boyer-Moore String Searching, etc.
Method 1 - Brute Force
A straightforward brute force approach is to generate all anagrams of the pattern and check if any are substrings of the given string. This method is inefficient due to the exponential cost of generating all anagrams.
Method 2 - Sliding Histogram Approach – O(n) Time O(1) Space 🏆
This is similar to Rabin-Karp Algorithm Explained#Algorithm with Rolling Hash algorithm OR Count Occurrences of Anagram problem.
Notice that when we find an anagram starting from i
, we might potentially find another starting from i+1
if the character count distribution remains the same.
For example, consider the text “abaab” and pattern “aba”. Both strings have the same character count distribution (histogram: [a=2, b=1, c=0, ..]
for the first m
characters, where m
is the length of the pattern). If we remove the first character ‘a’ and add the fourth character ‘a’, the histogram remains the same ([a=2, b=1, c=0, ..]
).
Thus, we can create a fixed-length sliding window histogram for characters in the text and check if the window equals the pattern histogram at each position of the text. We keep sliding the window to the right by one position and check if the window equals the pattern histogram. This is a linear time algorithm if the alphabet size of the strings is finite (for example, a 256 character set).
Algorithm Steps:
- Store counts of frequencies of characters in the pattern in an array
countP[]
. Also, store counts of frequencies of characters in the first window of text in an arraycountTW[]
. - Run a loop from
i = M
toN-1
(whereM
is the length of pattern andN
is the length of the text). In the loop: a. If the two count arrays are identical, we’ve found an occurrence. b. Increment the count of the current character of text incountTW[]
. c. Decrement the count of the first character in the previous window incountTW[]
. - Explicitly check the last window outside the loop, as it is not checked by the loop.
Code
Java
public class Solution {
public List<Integer> findAnagrams(String s, String p) {
List<Integer> result = new ArrayList<>();
int ns = s.length(), np = p.length();
if (ns < np) return result;
int[] pCount = new int[26];
int[] sCount = new int[26];
for (char c : p.toCharArray()) {
pCount[c - 'a']++;
}
for (int i = 0; i < ns; i++) {
sCount[s.charAt(i) - 'a']++;
if (i >= np) {
sCount[s.charAt(i - np) - 'a']--;
}
if (Arrays.equals(pCount, sCount)) {
result.add(i - np + 1);
}
}
return result;
}
}
Python
class Solution:
def findAnagrams(self, s: str, p: str):
ns, np = len(s), len(p)
if ns < np:
return []
p_count = Counter(p)
s_count = Counter()
result = []
for i in range(ns):
s_count[s[i]] += 1
if i >= np:
if s_count[s[i - np]] == 1:
del s_count[s[i - np]]
else:
s_count[s[i - np]] -= 1
if p_count == s_count:
result.append(i - np + 1)
return result
Complexity
- ⏰ Time complexity:
O(n)
, wheren
is the length of strings
. This is because we use a sliding window approach, which processes each character ofs
a constant number of times. - 🧺 Space complexity:
O(1)
, as the size of the hash maps used is constant (26 letters of the alphabet).