Search a long string for small strings in an array
Problem
Given a string s and an array of smaller strings t, design a method to search s for each small string in T.
Examples
Example 1:
Input: s = "hello world", T = ["hello", "world", "test"]
Output: [0, 6, -1]
Explanation:
- "hello" is found at index 0 in "hello world"
- "world" is found at index 6 in "hello world"
- "test" is not found in "hello world", so return -1
Example 2:
Input: s = "algorithm design", T = ["algo", "design", "thesis"]
Output: [0, 10, -1]
Explanation:
- "algo" is found at index 0 in "algorithm design"
- "design" is found at index 10 in "algorithm design"
- "thesis" is not found in "algorithm design", so return -1
Solution
We have plethora of options - [String Pattern Search/Matching Algorithm Index](find-the-index-of-the-first-occurrence-in-a-string). Assuming you have a significant number of smaller strings, [Rabin-Karp](rabin-karp-algorithm-explained) is the standard way to search for multiple small strings in a very very large string. if You only have a few smaller strings, simply repeating [Boyer-Moore](boyer-moore-string-searching) for each one might be a better alternative.
Method 1 - Naive using indexOf
To solve this problem, we will iterate through each string in the array T and check if it is a substring of s. If it is found, we will return the starting index; otherwise, return -1.
Approach
- Iterate through each string
tinT. - Use a method to get the starting index of
tins. Iftis found, store the starting index; if not, store-1. - Return the resulting list of indices.
Code
Java
class Solution {
public List<Integer> searchSubstrings(String s, String[] T) {
List<Integer> result = new ArrayList<>();
for (String t : T) {
int index = s.indexOf(t);
result.add(index);
}
return result;
}
public static void main(String[] args) {
Solution sol = new Solution();
String s = "hello world";
String[] T = {"hello", "world", "test"};
System.out.println(sol.searchSubstrings(s, T)); // [0, 6, -1]
}
}
Python
class Solution:
def search_substrings(self, s: str, T: List[str]) -> List[int]:
return [s.find(t) for t in T]
# Example usage
sol = Solution()
s = "hello world"
T = ["hello", "world", "test"]
print(sol.search_substrings(s, T)) # [0, 6, -1]
Complexity
- ⏰ Time complexity:
O(n * m), wherenis the number of strings inTandmis the length of the strings. Finding a substring takesO(m)time in the worst case. - 🧺 Space complexity:
O(n)as we need to store the result for each string inT.
Method 2 - Using suffix tree
Read more about suffix tree. We can first get all the suffices of s and check for each element t in T to see if t is the beginning part of any suffix. To make this more efficient, we can build a suffix tree and perform the operations.
For example, s = "mississippi"; T = { "is", "sip", "hi", "sis" }.
The suffices of s
| [0] | “mississippi” |
| [1] | “ississippi” |
| [2] | “ssissippi” |
| [3] | “sissippi” |
| [4] | “issippi” |
| [5] | “ssippi” |
| [6] | “sippi” |
| [7] | “ippi” |
| [8] | “ppi” |
| [9] | “pi” |
| [10] | “i” |
Then for "is" in T, we see it starts at the beginning of suffixes [1] and [4]. For "sip", we see it in [6]. We do not see "hi" in the suffixes, but we do see "sis" in [3].
Lets see how to build suffix tree - Suffix Tree ADT.
Code
Java
public class SuffixTree {
private static final int ALPHABET_SIZE = 256;
private Node root;
private class Node {
Node[] children = new Node[ALPHABET_SIZE];
ArrayList<Integer> indices = new ArrayList<>();
}
public SuffixTree(String s) {
root = new Node();
for (int i = 0; i < s.length(); i++) {
addSuffix(s.substring(i), i);
}
}
private void addSuffix(String s, int index) {
Node current = root;
current.indices.add(index);
for (char c : s.toCharArray()) {
if (current.children[c] == null) {
current.children[c] = new Node();
}
current = current.children[c];
current.indices.add(index);
}
}
public List<Integer> searchSubstrings(String s, String[] T) {
List<Integer> result = new ArrayList<>();
for (String t : T) {
result.add(search(t));
}
return result;
}
private int search(String t) {
Node current = root;
for (char c : t.toCharArray()) {
if (current.children[c] == null) {
return -1;
}
current = current.children[c];
}
return current.indices.get(0);
}
public static void main(String[] args) {
SuffixTree suffixTree = new SuffixTree("mississippi");
String[] T = {"is", "sip", "hi", "sis"};
System.out.println(suffixTree.searchSubstrings("mississippi", T)); // [1, 6, -1, 3]
}
}
Python
class SuffixTreeNode:
def __init__(self):
self.children = {}
self.indices = []
class SuffixTree:
def __init__(self, s: str):
self.root = SuffixTreeNode()
for i in range(len(s)):
self._add_suffix(s[i:], i)
def _add_suffix(self, suffix: str, index: int):
current = self.root
current.indices.append(index)
for char in suffix:
if char not in current.children:
current.children[char] = SuffixTreeNode()
current = current.children[char]
current.indices.append(index)
def search_substrings(self, s: str, T: [str]) -> [int]:
results = []
for t in T:
results.append(self._search(t))
return results
def _search(self, t: str) -> int:
current = self.root
for char in t:
if char not in current.children:
return -1
current = current.children[char]
return current.indices[0] if current.indices else -1
# Example usage
s = "mississippi"
T = ["is", "sip", "hi", "sis"]
suffix_tree = SuffixTree(s)
print(suffix_tree.search_substrings(s, T)) # [1, 6, -1, 3]
Complexity
- ⏰ Time complexity:
O(n + m + k), wherenis the length of the strings,mis the number of strings inT, andkis the total length of all strings inT. Building the suffix tree takesO(n), and each search operation takesO(m + k). - 🧺 Space complexity:
O(n)for storing the suffix tree.