Find anagrams of a string in another string

Problem

Given two strings s and p, return an array of all the start indices of p’s anagrams in s. You may return the answer in any order.

An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

Examples

Example 1:

Input: s = "cbaebabacd", p = "abc"
Output: [0,6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input: s = "abab", p = "ab"
Output: [0,1,2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

Solution

This problem differs from standard pattern searching because we also need to find anagrams. Hence, we cannot use traditional pattern searching algorithms such as Knuth Morris Pratt KMP Algorithm, Rabin-Karp Algorithm Explained, Boyer-Moore String Searching, etc.

Method 1 - Brute Force

A straightforward brute force approach is to generate all anagrams of the pattern and check if any are substrings of the given string. This method is inefficient due to the exponential cost of generating all anagrams.

Method 2 - Sliding Histogram Approach – O(n) Time O(1) Space 🏆

This is similar to Rabin-Karp Algorithm Explained#Algorithm with Rolling Hash algorithm OR Count Occurrences of Anagram problem.

Notice that when we find an anagram starting from i, we might potentially find another starting from i+1 if the character count distribution remains the same.

For example, consider the text “abaab” and pattern “aba”. Both strings have the same character count distribution (histogram: [a=2, b=1, c=0, ..] for the first m characters, where m is the length of the pattern). If we remove the first character ‘a’ and add the fourth character ‘a’, the histogram remains the same ([a=2, b=1, c=0, ..]).

Thus, we can create a fixed-length sliding window histogram for characters in the text and check if the window equals the pattern histogram at each position of the text. We keep sliding the window to the right by one position and check if the window equals the pattern histogram. This is a linear time algorithm if the alphabet size of the strings is finite (for example, a 256 character set).

Algorithm Steps:

Store counts of frequencies of characters in the pattern in an array countP[]. Also, store counts of frequencies of characters in the first window of text in an array countTW[].
Run a loop from i = M to N-1 (where M is the length of pattern and N is the length of the text). In the loop: a. If the two count arrays are identical, we’ve found an occurrence. b. Increment the count of the current character of text in countTW[]. c. Decrement the count of the first character in the previous window in countTW[].
Explicitly check the last window outside the loop, as it is not checked by the loop.

Code

Java

public class Solution {
    public List<Integer> findAnagrams(String s, String p) {
        List<Integer> result = new ArrayList<>();
        int ns = s.length(), np = p.length();
        
        if (ns < np) return result;
        
        int[] pCount = new int[26];
        int[] sCount = new int[26];
        
        for (char c : p.toCharArray()) {
            pCount[c - 'a']++;
        }
        
        for (int i = 0; i < ns; i++) {
            sCount[s.charAt(i) - 'a']++;
            if (i >= np) {
                sCount[s.charAt(i - np) - 'a']--;
            }
            if (Arrays.equals(pCount, sCount)) {
                result.add(i - np + 1);
            }
        }
        
        return result;
    }
}

Python

class Solution:
    def findAnagrams(self, s: str, p: str):
        ns, np = len(s), len(p)
        if ns < np:
            return []
        
        p_count = Counter(p)
        s_count = Counter()
        
        result = []
        for i in range(ns):
            s_count[s[i]] += 1
            if i >= np:
                if s_count[s[i - np]] == 1:
                    del s_count[s[i - np]]
                else:
                    s_count[s[i - np]] -= 1
            if p_count == s_count:
                result.append(i - np + 1)
        
        return result

Complexity

⏰ Time complexity: O(n), where n is the length of string s. This is because we use a sliding window approach, which processes each character of s a constant number of times.
🧺 Space complexity: O(1), as the size of the hash maps used is constant (26 letters of the alphabet).

Problem#

Examples#

Solution#

Method 1 - Brute Force#

Method 2 - Sliding Histogram Approach – O(n) Time O(1) Space 🏆#

Algorithm Steps:#

Code#

Java#

Python#

Complexity#