Approach

We have already seen Naive Brute Force String Search Algorithm. Now, let’s consider a special scenario where all the characters in the pattern are different. In such a case, we can optimize the original naive string matching algorithm to work more efficiently.

Question: In the naive string matching algorithm, we always slide the pattern by one character. Can we modify the naive algorithm to make it more efficient when all characters of the pattern are distinct?

Solution: Yes, we can modify the naive algorithm to be more efficient in this scenario. When all characters in the pattern are different, we can slide the pattern by more than one character after a mismatch. Here’s the rationale: after j matches, a mismatch indicates that the first character of the pattern won’t match any of the j matched characters in the haystack. Therefore, we can safely slide the pattern by j positions without missing any potential match

Code

Java
public class Solution {
    public int strStr(String haystack, String needle) {
        if (needle == null || needle.isEmpty()) {
            return 0;
        }
        if (haystack == null) {
            return -1;
        }

        int n = haystack.length();
        int m = needle.length();
        int i = 0;
		// notice: equality in loop here,eg: haystack='a', needle='a'
        while (i <= n - m) {
            int j;
            for (j = 0; j < m; j++) {
                if (needle.charAt(j) != haystack.charAt(i + j)) {
                    break;
                }
            }
            if (j == m) {  // found a match
                return i;
            } else if (j == 0) {
                i++;
            } else {
                i += j;  // skip more characters as all characters in needle are distinct
            }
        }

        return -1;
    }

    public static void main(String[] args) {
        Solution solution = new Solution();
        System.out.println(solution.strStr("hello", "ll"));  // Output: 2
        System.out.println(solution.strStr("aaaaa", "bba")); // Output: -1
        System.out.println(solution.strStr("", ""));         // Output: 0
    }
}
Python
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        if needle == "" or needle is None:
            return 0
        if haystack is None:
            return -1

        n = len(haystack)
        m = len(needle)
        i = 0

        while i <= n - m:
            j = 0
            while j < m:
                if needle[j] != haystack[i + j]:
                    break
                j += 1
            if j == m:
                return i
            elif j == 0:
                i += 1
            else:
                i += j  # skip more characters as all characters in needle are distinct

        return -1

Complexity

  • ⏰ Time complexity: O(n * m), in the worst case, where n is the length of haystack and m is the length of needle. The optimized sliding reduces the number of comparisons, but the complexity remains asymptotically the same.
  • 🧺 Space complexity: O(1)

Pros and Cons

Pros

  1. No Pre-processing Required: Similar to the basic brute force approach, this method does not require any pre-processing of the text.
  2. Reduced Sliding: By exploiting the property that all characters in the pattern are unique, we can potentially skip more characters and thus reduce the number of comparisons.
  3. Effective for Short Texts and Patterns: This method remains efficient for small-sized texts and patterns.

Cons

  1. Still Inefficient for Long Texts and Patterns: This approach, while optimized for specific patterns, is still not suitable for very large texts and patterns when compared to advanced algorithms like KMP or Boyer-Moore.
  2. Requires Distinct Characters in Needle for Optimization: The benefit of this optimization is valid only when all characters in the needle are distinct.