Problem

A shortest common supersequence (SCS) is a common supersequence of two given sequences that is of minimal length. In the shortest common supersequence problem, the sequences $X = \langle x_1, x_2, \ldots, x_m \rangle$ and $Y = \langle y_1, \ldots, y_n \rangle$ are given and the task is to find the shortest possible common supersequence of these sequences. In general, an SCS is not unique.

Given two sequences X and Y, a sequence U is a common supersequence if both X and Y are subsequences of U. In other words, the shortest common supersequence of two strings x and y is the shortest string z such that both x and y are subsequences of z.

Examples

Example 1:

Input: X = "abac", Y = "cab"
Output: "cabac"
Explanation: 
- "cabac" is the shortest string that contains both "abac" and "cab" as subsequences.

Example 2:

Input: X = "k5kc", Y = "5k"
Output: "k5kc"

Solution

Method 1 - DP

Forming the shortest common supersequence (SCS) can be effectively achieved using the longest common subsequence (LCS). Here’s how:

  1. Calculate the LCS: Identify the longest common subsequence between the two sequences. The LCS helps structure the common elements of both sequences.
  2. Integrate Non-LCS Characters: Construct the SCS by merging characters from both sequences around the LCS, while maintaining their relative order.

For example, consider:

$$ X = \text{“abcbdab”} \text{ and } Y = \text{“bdcaba”} $$

The LCS of X and Y is $Z = \text{“bcba”}$

To build the SCS, we include characters from X and Y that are not part of the LCS while preserving their order. Thus, the SCS U is: $$ U = \text{“abdcabdab”} $$

This demonstrates that the length of the SCS t satisfies r + t = m + n, where r is the LCS length, and m and n are the lengths of X and Y, respectively. However, this property does not hold for more than two input sequences.

Approach

  1. Define LCS Helper Function: Utilize a helper function to calculate the longest common subsequence.
  2. Construct SCS Using LCS: Build the supersequence by incorporating characters from both sequences around the characters of the LCS, ensuring minimal length and proper order.

Code

Java
class Solution {

    public String shortestCommonSupersequence(String X, String Y) {
        int n = X.length();
        int m = Y.length();
        
        // Calculate LCS
        String lcs = getLCS(X, Y, n, m);
        
        StringBuilder sb = new StringBuilder();
        int i = 0, j = 0;
        
        // Build the shortest common supersequence based on LCS
        for (char ch : lcs.toCharArray()) {
            while (i < n && X.charAt(i) != ch) {
                sb.append(X.charAt(i++));
            }
            while (j < m && Y.charAt(j) != ch) {
                sb.append(Y.charAt(j++));
            }
            sb.append(ch);
            i++;
            j++;
        }
        
        // Append remaining characters
        sb.append(X.substring(i)).append(Y.substring(j));
        
        return sb.toString();
    }
    
    private String getLCS(String X, String Y, int n, int m) {
        int[][] dp = new int[n+1][m+1];
        
        // Fill DP table
        for (int i = 1; i <= n; i++) {
            for (int j = 1; j <= m; j++) {
                if (X.charAt(i-1) == Y.charAt(j-1)) {
                    dp[i][j] = dp[i-1][j-1] + 1;
                } else {
                    dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
                }
            }
        }
        
        // Build LCS from DP table
        StringBuilder lcs = new StringBuilder();
        int i = n, j = m;
        while (i > 0 && j > 0) {
            if (X.charAt(i-1) == Y.charAt(j-1)) {
                lcs.append(X.charAt(i-1));
                i--; j--;
            } else if (dp[i-1][j] > dp[i][j-1]) {
                i--;
            } else {
                j--;
            }
        }
        
        return lcs.reverse().toString();
    }
}
Python
class Solution:
    def shortest_common_supersequence(self, X: str, Y: str) -> str:
        n, m = len(X), len(Y)
        
        # Calculate LCS
        lcs = self.get_lcs(X, Y, n, m)
        
        i, j = 0, 0
        result = []
        
        # Build the shortest common supersequence based on LCS
        for ch in lcs:
            while i < n and X[i] != ch:
                result.append(X[i])
                i += 1
            while j < m and Y[j] != ch:
                result.append(Y[j])
                j += 1
            result.append(ch)
            i += 1
            j += 1
        
        # Append remaining characters
        result.append(X[i:])
        result.append(Y[j:])
        
        return ''.join(result)
    
    def get_lcs(self, X: str, Y: str, n: int, m: int) -> str:
        dp = [[0] * (m + 1) for _ in range(n + 1)]
        
        # Fill DP table
        for i in range(1, n + 1):
            for j in range(1, m + 1):
                if X[i-1] == Y[j-1]:
                    dp[i][j] = dp[i-1][j-1] + 1
                else:
                    dp[i][j] = max(dp[i-1][j], dp[i][j-1])
        
        # Build LCS from DP table
        lcs = []
        i, j = n, m
        while i > 0 and j > 0:
            if X[i-1] == Y[j-1]:
                lcs.append(X[i-1])
                i -= 1
                j -= 1
            elif dp[i-1][j] > dp[i][j-1]:
                i -= 1
            else:
                j -= 1
        
        return ''.join(reversed(lcs))

Complexity

  • ⏰ Time complexity: O(n * m) for both calculating the LCS and constructing the SCS, where n and m are the lengths of X and Y.
  • 🧺 Space complexity: O(n * m) due to the storage of the DP table.