Given a text string and a dictionary of M patterns, find all occurrences of any pattern in the text. The patterns may have different lengths. The goal is to efficiently find all matches of any pattern in the text, even when the number of patterns is large.
Input:
patterns =["he","she","his","hers"]text ="ahishers"Output:
Matches at positions:[1,2,3,4,5]Explanation:
The patterns "he","she","his", and "hers" all appear as substrings in the text at various positions.
When searching for multiple patterns in a long text, running a single-pattern search for each pattern is too slow. The Aho-Corasick algorithm builds a trie of all patterns and augments it with suffix links (like KMP) to allow simultaneous search for all patterns in a single pass over the text. This enables efficient multi-pattern matching in linear time.
classTrieNode:
def__init__(self):
self.children = {}
self.fail =None self.word_ids = []
classSolution:
defaho_corasick(self, patterns: list[str], text: str) -> list[int]:
root = TrieNode()
for idx, pat in enumerate(patterns):
node = root
for c in pat:
if c notin node.children:
node.children[c] = TrieNode()
node = node.children[c]
node.word_ids.append(idx)
# Build fail linksfrom collections import deque
root.fail = root
q = deque()
for child in root.children.values():
child.fail = root
q.append(child)
while q:
node = q.popleft()
for c, child in node.children.items():
f = node.fail
while f != root and c notin f.children:
f = f.fail
if c in f.children and f.children[c] != child:
child.fail = f.children[c]
else:
child.fail = root
child.word_ids += child.fail.word_ids
q.append(child)
# Search ans = []
node = root
for i, c in enumerate(text):
while node != root and c notin node.children:
node = node.fail
if c in node.children:
node = node.children[c]
for _ in node.word_ids:
ans.append(i)
return ans
⏰ Time complexity: O(N + L + Z), where N is the text length, L is the total length of all patterns, and Z is the number of matches. Each character and trie edge is processed at most once.
🧺 Space complexity: O(L + Q), where Q is the alphabet size times the number of trie nodes (for child pointers).