Problem

Examples

Example 1

1
2
3
Input: "abc"
Output: ["a", "ab", "abc", "b", "bc", "c"]
Explanation: Since all substrings of the string "abc" are already unique, the output contains all substrings as a unique list.

Example 2

1
2
3
Input: "aaa"
Output: ["a", "aa", "aaa"]
Explanation: For "aaa", only the substrings "a", "aa", and "aaa" are unique. Even though "a" is repeated multiple times in the string, it appears only once in the result as we are only considering unique substrings.

Solution

Method 1 - Using the set

  • substring is any contiguous sequence of characters within the string. Duplicates occur if identical substrings can be generated from the same or different starting positions.
  • To ensure we only return unique substrings, we store each substring in a data structure (like a set) that inherently avoids duplicates.
  • The uniqueness is checked as substrings are being generated. Any repeated substring is ignored.

Approach

Steps to generate unique substrings:

  1. Use two nested loops:
    • The outer loop selects starting positions (start) for substrings.
    • The inner loop generates substrings of various lengths, starting from that position (end).
  2. Each substring is added to a set (Python’s set or Java’s HashSet) rather than a list, as sets ensure all elements are unique.
  3. Finally, if an ordered collection is desired as the output (e.g., an ordered list of substrings), convert the set to a list (Python) or use appropriate collection methods in Java.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
public class Solution {
    public Set<String> generateUniqueSubstrings(String inputString) {
        Set<String> substrings = new HashSet<>();  // Use a HashSet to store unique substrings
        int n = inputString.length();
        for (int start = 0; start < n; start++) {
            for (int end = start; end < n; end++) {
                substrings.add(inputString.substring(start, end + 1)); // Add substring to the set
            }
        }
        return substrings;
    }

    // Example usage:
    public static void main(String[] args) {
        Solution solution = new Solution();
        System.out.println(solution.generateUniqueSubstrings("aaa")); // Output: [a, aa, aaa]
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Solution:
    def generate_unique_substrings(self, input_string: str):
        substrings = set()  # Use a set to store unique substrings
        n = len(input_string)
        for start in range(n):
            for end in range(start, n):
                substrings.add(input_string[start:end + 1])  # Add substring to the set
        return list(substrings)  # Convert set back to a list for output

# Example usage:
solution = Solution()
print(solution.generate_unique_substrings("aaa"))  # Output: ['a', 'aa', 'aaa']

Complexity

  • ⏰ Time complexity: O(n^2). Generating substrings involves two nested loops:
    • Outer loop runs O(n) times for each starting position in the string.
    • Inner loop generates substrings for the remaining characters, running O(n) in the worst case.
    • Combined, the time complexity is O(n^2).
  • 🧺 Space complexity: O(u), where u is the number of unique substrings:
    • set is used to store unique substrings. The number of unique substrings depends on the input size and character distribution.

Method 2 - Using Suffix Tree

We can build suffix tree using Ukkonen’s algorithm:

  1. Input Encoding:
    • Append a unique end character "$" to ensure all suffixes terminate uniquely.
    • This guarantees the tree represents full suffixes.
  2. Edge Representation:
    • Each edge holds start and end indices (startend) rather than storing substrings directly.
    • This is compact and avoids unnecessary space usage.
  3. Adding Suffixes:
    • Each suffix is added incrementally to the tree. If a character is already present, the edge’s child nodes are traversed further.
  4. Depth-First Traversal:
    • The collect_substrings() method traverses the tree, combining edges into substrings using the indices.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

class SuffixTreeEdge {
    int start; // Start index of the substring
    int end;   // End index of the substring
    Map<Character, SuffixTreeEdge> children; // Children edges

    public SuffixTreeEdge(int start, int end) {
        this.start = start;
        this.end = end;
        this.children = new HashMap<>();
    }
}

class SuffixTree {
    private final String text; // Input string
    private final SuffixTreeEdge root; // Root of the tree

    public SuffixTree(String text) {
        this.text = text + "$"; // Appending a unique terminator to ensure all suffixes are unique
        this.root = new SuffixTreeEdge(-1, -1); // Root node (no edge created initially)
        buildTree();
    }

    private void buildTree() {
        int n = text.length();
        for (int i = 0; i < n; i++) {
            insertSuffix(i); // Insert suffix starting at position `i`
        }
    }

    private void insertSuffix(int suffixStart) {
        SuffixTreeEdge currentEdge = root;

        // Iterate over characters in the suffix starting at `suffixStart`
        for (int i = suffixStart; i < text.length(); i++) {
            char currentChar = text.charAt(i);

            // If this character doesn't exist in current edge, create a new edge
            if (!currentEdge.children.containsKey(currentChar)) {
                currentEdge.children.put(currentChar, new SuffixTreeEdge(i, text.length()));
                return;
            }

            // Traverse to the next edge (existing child)
            currentEdge = currentEdge.children.get(currentChar);
        }
    }

    public Set<String> collectSubstrings() {
        Set<String> substrings = new HashSet<>(); // Set to collect unique substrings
        collectSubstringsHelper(root, "", substrings);
        return substrings;
    }

    private void collectSubstringsHelper(SuffixTreeEdge edge, String currentPath, Set<String> substrings) {
        if (edge.start != -1) {
            // Append the substring represented by this edge
            currentPath += text.substring(edge.start, edge.end);
            substrings.add(currentPath); // Add the substring to the unique set
        }

        // Recursively collect substrings from child edges
        for (SuffixTreeEdge childEdge : edge.children.values()) {
            collectSubstringsHelper(childEdge, currentPath, substrings);
        }
    }
}

public class Main {
    public static void main(String[] args) {
        String text1 = "abc";
        SuffixTree suffixTree1 = new SuffixTree(text1);
        Set<String> uniqueSubstrings1 = suffixTree1.collectSubstrings();
        System.out.println("Unique Substrings: " + uniqueSubstrings1);

        String text2 = "aaa";
        SuffixTree suffixTree2 = new SuffixTree(text2);
        Set<String> uniqueSubstrings2 = suffixTree2.collectSubstrings();
        System.out.println("Unique Substrings: " + uniqueSubstrings2);
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class SuffixTreeEdge:
    def __init__(self, start, end, child=None):
        self.start = start  # Start index of the substring
        self.end = end      # End index of the substring
        self.child = child or {}  # Dictionary for child nodes


class SuffixTree:
    def __init__(self, text):
        self.text = text + "$"  # Append a unique end character to ensure uniqueness of substrings
        self.root = {}
        self.build_tree()

    def build_tree(self):
        """
        Build the Suffix Tree using Ukkonen's Algorithm.
        This implementation avoids redundant storage by using indices.
        """
        n = len(self.text)
        for i in range(n):  # Start building suffixes
            suffix = self.text[i:]
            self.insert_suffix(suffix, i)

    def insert_suffix(self, suffix, suffix_start):
        """
        Insert a suffix into the suffix tree.
        Use indices for substrings to ensure compact representation.
        """
        current_node = self.root
        for i, char in enumerate(suffix):
            if char not in current_node:
                # Create a new edge with start and end indices
                current_node[char] = SuffixTreeEdge(suffix_start + i, len(self.text))
            current_node = current_node[char].child

    def collect_substrings(self):
        """
        Collect all unique substrings using depth-first traversal.
        Substring edges are represented using indices for efficiency.
        """
        substrings = set()
        stack = [(self.root, "")]
        while stack:
            current_node, path = stack.pop()
            for char, edge in current_node.items():
                new_path = path + self.text[edge.start:edge.end]
                substrings.add(new_path)  # Add the substring represented by the edge
                stack.append((edge.child, new_path))  # Continue traversal
        return substrings


# Example Usage
text = "abc"
suffix_tree = SuffixTree(text)
unique_substrings = suffix_tree.collect_substrings()  # Collect all unique substrings
print("Unique Substrings:", unique_substrings)

text2 = "aaa"
suffix_tree2 = SuffixTree(text2)
unique_substrings2 = suffix_tree2.collect_substrings()  # Collect all unique substrings
print("Unique Substrings:", unique_substrings2)

Complexity

  • ⏰ Time complexity: O(n)
    • Ukkonen’s algorithm processes the entire input string incrementally in phases, ensuring each character is handled in constant time. Note that, if the construction of tree was naive, then time complexity would have been O(n^2).
  • 🧺 Space complexity: O(n)
    • Compact representation of edges using indices (startend) ensures storage is proportional to the length of the input string. Total space required for the suffix tree: O(n).
    • HashSet is used to store unique substrings, requiring space proportional to the number of unique substrings. Total space required for substring storage: O(u), where u is the number of unique substrings.