Problem
Given a string s
and an array of smaller strings t
, design a method to search s for each small string in T.
Examples
Example 1:
Input: s = "hello world", T = ["hello", "world", "test"]
Output: [0, 6, -1]
Explanation:
- "hello" is found at index 0 in "hello world"
- "world" is found at index 6 in "hello world"
- "test" is not found in "hello world", so return -1
Example 2:
Input: s = "algorithm design", T = ["algo", "design", "thesis"]
Output: [0, 10, -1]
Explanation:
- "algo" is found at index 0 in "algorithm design"
- "design" is found at index 10 in "algorithm design"
- "thesis" is not found in "algorithm design", so return -1
Solution
We have plethora of options - String Pattern Search/Matching Algorithm Index. Assuming you have a significant number of smaller strings, Rabin-Karp is the standard way to search for multiple small strings in a very very large string. if You only have a few smaller strings, simply repeating Boyer-Moore for each one might be a better alternative.
Method 1 - Naive using indexOf
To solve this problem, we will iterate through each string in the array T
and check if it is a substring of s
. If it is found, we will return the starting index; otherwise, return -1
.
Approach
- Iterate through each string
t
inT
. - Use a method to get the starting index of
t
ins
. Ift
is found, store the starting index; if not, store-1
. - Return the resulting list of indices.
Code
Java
class Solution {
public List<Integer> searchSubstrings(String s, String[] T) {
List<Integer> result = new ArrayList<>();
for (String t : T) {
int index = s.indexOf(t);
result.add(index);
}
return result;
}
public static void main(String[] args) {
Solution sol = new Solution();
String s = "hello world";
String[] T = {"hello", "world", "test"};
System.out.println(sol.searchSubstrings(s, T)); // [0, 6, -1]
}
}
Python
class Solution:
def search_substrings(self, s: str, T: List[str]) -> List[int]:
return [s.find(t) for t in T]
# Example usage
sol = Solution()
s = "hello world"
T = ["hello", "world", "test"]
print(sol.search_substrings(s, T)) # [0, 6, -1]
Complexity
- ⏰ Time complexity:
O(n * m)
, wheren
is the number of strings inT
andm
is the length of the strings
. Finding a substring takesO(m)
time in the worst case. - 🧺 Space complexity:
O(n)
as we need to store the result for each string inT
.
Method 2 - Using suffix tree
Read more about suffix tree. We can first get all the suffices of s
and check for each element t
in T
to see if t
is the beginning part of any suffix. To make this more efficient, we can build a suffix tree and perform the operations.
For example, s = "mississippi"; T = { "is", "sip", "hi", "sis" }
.
The suffices of s
[0] | “mississippi” |
[1] | “ississippi” |
[2] | “ssissippi” |
[3] | “sissippi” |
[4] | “issippi” |
[5] | “ssippi” |
[6] | “sippi” |
[7] | “ippi” |
[8] | “ppi” |
[9] | “pi” |
[10] | “i” |
Then for “is” in T , we see it starts at the beginning of suffixes [1] and [4] . For “sip”, we see it in [6] . We do not see “hi” in the suffixes, but we do see “sis” in [3] . |
Lets see how to build suffix tree - Suffix Tree ADT.
Code
Java
public class SuffixTree {
private static final int ALPHABET_SIZE = 256;
private Node root;
private class Node {
Node[] children = new Node[ALPHABET_SIZE];
ArrayList<Integer> indices = new ArrayList<>();
}
public SuffixTree(String s) {
root = new Node();
for (int i = 0; i < s.length(); i++) {
addSuffix(s.substring(i), i);
}
}
private void addSuffix(String s, int index) {
Node current = root;
current.indices.add(index);
for (char c : s.toCharArray()) {
if (current.children[c] == null) {
current.children[c] = new Node();
}
current = current.children[c];
current.indices.add(index);
}
}
public List<Integer> searchSubstrings(String s, String[] T) {
List<Integer> result = new ArrayList<>();
for (String t : T) {
result.add(search(t));
}
return result;
}
private int search(String t) {
Node current = root;
for (char c : t.toCharArray()) {
if (current.children[c] == null) {
return -1;
}
current = current.children[c];
}
return current.indices.get(0);
}
public static void main(String[] args) {
SuffixTree suffixTree = new SuffixTree("mississippi");
String[] T = {"is", "sip", "hi", "sis"};
System.out.println(suffixTree.searchSubstrings("mississippi", T)); // [1, 6, -1, 3]
}
}
Python
class SuffixTreeNode:
def __init__(self):
self.children = {}
self.indices = []
class SuffixTree:
def __init__(self, s: str):
self.root = SuffixTreeNode()
for i in range(len(s)):
self._add_suffix(s[i:], i)
def _add_suffix(self, suffix: str, index: int):
current = self.root
current.indices.append(index)
for char in suffix:
if char not in current.children:
current.children[char] = SuffixTreeNode()
current = current.children[char]
current.indices.append(index)
def search_substrings(self, s: str, T: [str]) -> [int]:
results = []
for t in T:
results.append(self._search(t))
return results
def _search(self, t: str) -> int:
current = self.root
for char in t:
if char not in current.children:
return -1
current = current.children[char]
return current.indices[0] if current.indices else -1
# Example usage
s = "mississippi"
T = ["is", "sip", "hi", "sis"]
suffix_tree = SuffixTree(s)
print(suffix_tree.search_substrings(s, T)) # [1, 6, -1, 3]
Complexity
- ⏰ Time complexity:
O(n + m + k)
, wheren
is the length of the strings
,m
is the number of strings inT
, andk
is the total length of all strings inT
. Building the suffix tree takesO(n)
, and each search operation takesO(m + k)
. - 🧺 Space complexity:
O(n)
for storing the suffix tree.