Longest Common Substring

Problem

Solve the longest common substring problem is to find the longest string that is a substring of two or more given strings.

Examples

Example 1:

Input: a = "ABABC", b = "BABCA"
Output: "BABC"
Explanation: 
- "BABC" is the longest common substring of "ABABC" and "BABCA".

Example 2:

Input: a = "XYZ", b = "ABC"
Output: ""
Explanation: 
- There is no common substring between "XYZ" and "ABC".

Solution

we have already seen - Longest Common Subsequence Problem

Method 1 - Naive

Evaluate every substring from the first string against the second string and record the longest matching substring. The time complexity is O(n^2*m), where O(n^2) comes from generating substrings, and O(m) comes from checking each substring against the second string. This represents the naive approach to the problem.

Method 2 - Dynamic Programming

To solve the longest common substring problem, we can use dynamic programming in bottom-up manner. The idea is to create a 2D DP table where dp[i][j] represents the length of the longest common substring ending at positions i in a and j in b.

Here are the steps:

Define the DP Table:
- Let dp[i][j] represent the length of the longest common substring ending at i in a and j in b.
Base Cases:
- Initialize dp[i][0] = 0 for all i and dp[0][j] = 0 for all j because any substring ending at index 0 has a length of 0.
Fill the DP Table:
- For each character pair (i, j):
  - If a[i-1] == b[j-1], then dp[i][j] = 1 + dp[i-1][j-1].
  - If a[i-1] != b[j-1], then dp[i][j] = 0.
- Keep track of the maximum length found during this process.
Extract the Longest Substring:
- Once the table is filled, find the maximum value in the table, which indicates the length of the longest common substring and the corresponding indices tell us where the substring ends in the original strings.

Dry Run

Lets take two strings a and b, a = “abc”, b = “abcd”.

Let’s populate the DP table row by row.

Initial Empty DP Table

 0  1  2  3  4
 -  a  b  c  d
0 [ 0, 0, 0, 0, 0 ]
1 [ 0, 0, 0, 0, 0 ]
2 [ 0, 0, 0, 0, 0 ]
3 [ 0, 0, 0, 0, 0 ]

Iterations

For i = 1, a[0] = 'a'
- For j = 1, b[0] = 'a':
  - a[0] == b[0] ⇒ dp[1][1] = dp[0][0] + 1 = 1
- For j = 2, b[1] = 'b':
  - a[0] != b[1] ⇒ dp[1][2] = 0
- For j = 3, b[2] = 'c':
  - a[0] != b[2] ⇒ dp[1][3] = 0
- For j = 4, b[3] = 'd':
  - a[0] != b[3] ⇒ dp[1][4] = 0
For i = 2, a[1] = 'b'
- For j = 1, b[0] = 'a':
  - a[1] != b[0] ⇒ dp[2][1] = 0
- For j = 2, b[1] = 'b':
  - a[1] == b[1] ⇒ dp[2][2] = dp[1][1] + 1 = 2
- For j = 3, b[2] = 'c':
  - a[1] != b[2] ⇒ dp[2][3] = 0
- For j = 4, b[3] = 'd':
  - a[1] != b[3] ⇒ dp[2][4] = 0
For i = 3, a[2] = 'c'
- For j = 1, b[0] = 'a':
  - a[2] != b[0] ⇒ dp[3][1] = 0
- For j = 2, b[1] = 'b':
  - a[2] != b[1] ⇒ dp[3][2] = 0
- For j = 3, b[2] = 'c':
  - a[2] == b[2] ⇒ dp[3][3] = dp[2][2] + 1 = 3
- For j = 4, b[3] = 'd':
  - a[2] != b[3] ⇒ dp[3][4] = 0

$$ \begin{array}{c|c|c|c|c} & 0 & \text{a} & \text{b} & \text{c} & \text{d} \ \hline 0 & 0 & 0 & 0 & 0 & 0 \ \hline \text{a} & 0 & 1 & 0 & 0 & 0 \ \hline \text{b} & 0 & 0 & 2 & 0 & 0 \ \hline \text{c} & 0 & 0 & 0 & 3 & 0 \ \end{array} $$

Code

Java

class Solution {
    public String longestCommonSubstring(String a, String b) {
        int n = a.length();
        int m = b.length();
        
        int[][] dp = new int[n + 1][m + 1];
        int maxLength = 0;
        int endIndex = -1;
        
        for (int i = 1; i <= n; i++) {
            for (int j = 1; j <= m; j++) {
                if (a.charAt(i - 1) == b.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                    if (dp[i][j] > maxLength) {
                        maxLength = dp[i][j];
                        endIndex = i;
                    }
                } else {
                    dp[i][j] = 0;
                }
            }
        }
        
        return maxLength == 0 ? "" : a.substring(endIndex - maxLength, endIndex);
    }
}

Python

class Solution:
    def longest_common_substring(self, a: str, b: str) -> str:
        n = len(a)
        m = len(b)
        
        # Initialize the DP table
        dp = [[0] * (m + 1) for _ in range(n + 1)]
        maxLength = 0
        endIndex = -1
        
        # Fill the DP table
        for i in range(1, n + 1):
            for j in range(1, m + 1):
                if a[i - 1] == b[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1] + 1
                    if dp[i][j] > maxLength:
                        maxLength = dp[i][j]
                        endIndex = i
                else:
                    dp[i][j] = 0
        
        return "" if maxLength == 0 else a[endIndex - maxLength:endIndex]

Problem#

Examples#

Solution#

Method 1 - Naive#

Method 2 - Dynamic Programming#

Dry Run#

Initial Empty DP Table#

Iterations#

Code#

Java#

Python#