Problem

Given a string s and an array of smaller strings t, design a method to search s for each small string in T.

Examples

Example 1:

Input: s = "hello world", T = ["hello", "world", "test"]
Output: [0, 6, -1]
Explanation: 
- "hello" is found at index 0 in "hello world"
- "world" is found at index 6 in "hello world"
- "test" is not found in "hello world", so return -1

Example 2:

Input: s = "algorithm design", T = ["algo", "design", "thesis"]
Output: [0, 10, -1]
Explanation: 
- "algo" is found at index 0 in "algorithm design"
- "design" is found at index 10 in "algorithm design"
- "thesis" is not found in "algorithm design", so return -1

Solution

We have plethora of options - String Pattern Search/Matching Algorithm Index. Assuming you have a significant number of smaller strings, Rabin-Karp is the standard way to search for multiple small strings in a very very large string. if You only have a few smaller strings, simply repeating Boyer-Moore for each one might be a better alternative.

Method 1 - Naive using indexOf

To solve this problem, we will iterate through each string in the array T and check if it is a substring of s. If it is found, we will return the starting index; otherwise, return -1.

Approach

  1. Iterate through each string t in T.
  2. Use a method to get the starting index of t in s. If t is found, store the starting index; if not, store -1.
  3. Return the resulting list of indices.

Code

Java
class Solution {
    public List<Integer> searchSubstrings(String s, String[] T) {
        List<Integer> result = new ArrayList<>();
        for (String t : T) {
            int index = s.indexOf(t);
            result.add(index);
        }
        return result;
    }

    public static void main(String[] args) {
        Solution sol = new Solution();
        String s = "hello world";
        String[] T = {"hello", "world", "test"};
        System.out.println(sol.searchSubstrings(s, T)); // [0, 6, -1]
    }
}
Python
class Solution:
    def search_substrings(self, s: str, T: List[str]) -> List[int]:
        return [s.find(t) for t in T]

# Example usage
sol = Solution()
s = "hello world"
T = ["hello", "world", "test"]
print(sol.search_substrings(s, T))  # [0, 6, -1]

Complexity

  • ⏰ Time complexity: O(n * m), where n is the number of strings in T and m is the length of the string s. Finding a substring takes O(m) time in the worst case.
  • 🧺 Space complexity: O(n) as we need to store the result for each string in T.

Method 2 - Using suffix tree

Read more about suffix tree. We can first get all the suffices of s and check for each element t in T to see if t is the beginning part of any suffix. To make this more efficient, we can build a suffix tree and perform the operations.

For example, s = "mississippi"; T = { "is", "sip", "hi", "sis" }.

The suffices of s

[0]“mississippi”
[1]“ississippi”
[2]“ssissippi”
[3]“sissippi”
[4]“issippi”
[5]“ssippi”
[6]“sippi”
[7]“ippi”
[8]“ppi”
[9]“pi”
[10]“i”
Then for “is” in T, we see it starts at the beginning of suffixes [1] and [4]. For “sip”, we see it in [6]. We do not see “hi” in the suffixes, but we do see “sis” in [3].

Lets  see how to build suffix tree - Suffix Tree ADT.

Code

Java
public class SuffixTree {
    private static final int ALPHABET_SIZE = 256;
    private Node root;
    
    private class Node {
        Node[] children = new Node[ALPHABET_SIZE];
        ArrayList<Integer> indices = new ArrayList<>();
    }
    
    public SuffixTree(String s) {
        root = new Node();
        for (int i = 0; i < s.length(); i++) {
            addSuffix(s.substring(i), i);
        }
    }
    
    private void addSuffix(String s, int index) {
        Node current = root;
        current.indices.add(index);
        for (char c : s.toCharArray()) {
            if (current.children[c] == null) {
                current.children[c] = new Node();
            }
            current = current.children[c];
            current.indices.add(index);
        }
    }
    
    public List<Integer> searchSubstrings(String s, String[] T) {
        List<Integer> result = new ArrayList<>();
        for (String t : T) {
            result.add(search(t));
        }
        return result;
    }
    
    private int search(String t) {
        Node current = root;
        for (char c : t.toCharArray()) {
            if (current.children[c] == null) {
                return -1;
            }
            current = current.children[c];
        }
        return current.indices.get(0);
    }

    public static void main(String[] args) {
        SuffixTree suffixTree = new SuffixTree("mississippi");
        String[] T = {"is", "sip", "hi", "sis"};
        System.out.println(suffixTree.searchSubstrings("mississippi", T)); // [1, 6, -1, 3]
    }
}
Python
class SuffixTreeNode:
    def __init__(self):
        self.children = {}
        self.indices = []

class SuffixTree:
    def __init__(self, s: str):
        self.root = SuffixTreeNode()
        for i in range(len(s)):
            self._add_suffix(s[i:], i)
    
    def _add_suffix(self, suffix: str, index: int):
        current = self.root
        current.indices.append(index)
        for char in suffix:
            if char not in current.children:
                current.children[char] = SuffixTreeNode()
            current = current.children[char]
            current.indices.append(index)
            
    def search_substrings(self, s: str, T: [str]) -> [int]:
        results = []
        for t in T:
            results.append(self._search(t))
        return results
    
    def _search(self, t: str) -> int:
        current = self.root
        for char in t:
            if char not in current.children:
                return -1
            current = current.children[char]
        return current.indices[0] if current.indices else -1

# Example usage
s = "mississippi"
T = ["is", "sip", "hi", "sis"]
suffix_tree = SuffixTree(s)
print(suffix_tree.search_substrings(s, T))  # [1, 6, -1, 3]

Complexity

  • ⏰ Time complexity: O(n + m + k), where n is the length of the string sm is the number of strings in T, and k is the total length of all strings in T. Building the suffix tree takes O(n), and each search operation takes O(m + k).
  • 🧺 Space complexity: O(n) for storing the suffix tree.