Problem

Due to a bug, there are many duplicate folders in a file system. You are given a 2D array paths, where paths[i] is an array representing an absolute path to the ith folder in the file system.

  • For example, ["one", "two", "three"] represents the path "/one/two/three".

Two folders (not necessarily on the same level) are identical if they contain the same non-empty set of identical subfolders and underlying subfolder structure. The folders do not need to be at the root level to be identical. If two or more folders are identical , then mark the folders as well as all their subfolders.

  • For example, folders "/a" and "/b" in the file structure below are identical. They (as well as their subfolders) should all be marked:
  • /a
  • /a/x
  • /a/x/y
  • /a/z
  • /b
  • /b/x
  • /b/x/y
  • /b/z
    • However, if the file structure also included the path "/b/w", then the folders "/a" and "/b" would not be identical. Note that "/a/x" and "/b/x" would still be considered identical even with the added folder.

Once all the identical folders and their subfolders have been marked, the file system will delete all of them. The file system only runs the deletion once, so any folders that become identical after the initial deletion are not deleted.

Return the 2D arrayans containing the paths of theremaining folders after deleting all the marked folders. The paths may be returned in any order.

Examples

Example 1

graph TD
    Z("/") --- A(a) & C(c) & D(d)
    D --- A1(a)
    
subgraph S1[" "]
    A --- B(b)
end

subgraph S2[" "]
    C --- B1(b)
end



classDef redBorderClass stroke:red,stroke-width:3px,fill:#f9f9f9;
class S1 redBorderClass
class S2 redBorderClass
  
1
2
3
4
5
Input: paths = [["a"],["c"],["d"],["a","b"],["c","b"],["d","a"]]
Output: [["d"],["d","a"]]
Explanation: The file structure is as shown.
Folders "/a" and "/c" (and their subfolders) are marked for deletion because they both contain an empty
folder named "b".

Example 2

graph TD
    Z("/") --- A(a) & C(c) & W(w)
    A --- B(b)
    C --- B1(b)
    B --- X(x)
    B1 --- X1(x)
    X --- Y(y)
    X1 --- Y1(y)
    W --- Y2(y)

subgraph S1[" "]
    X
    Y
end

subgraph S2[" "]
    W
    Y2
end

classDef redBorderClass stroke:red,stroke-width:3px,fill:#f9f9f9;
class S1 redBorderClass
class S2 redBorderClass
  
1
2
3
4
5
Input: paths = [["a"],["c"],["a","b"],["c","b"],["a","b","x"],["a","b","x","y"],["w"],["w","y"]]
Output: [["c"],["c","b"],["a"],["a","b"]]
Explanation: The file structure is as shown. 
Folders "/a/b/x" and "/w" (and their subfolders) are marked for deletion because they both contain an empty folder named "y".
Note that folders "/a" and "/c" are identical after the deletion, but they are not deleted because they were not marked beforehand.

Example 3

graph TD
    Z("/") --- A(a) & C(c)
    A --- B(b)
    C --- D(d)
  
1
2
3
4
Input: paths = [["a","b"],["c","d"],["c"],["a"]]
Output: [["c"],["c","d"],["a"],["a","b"]]
Explanation: All folders are unique in the file system.
Note that the returned array can be in a different order as the order does not matter.

Constraints

  • 1 <= paths.length <= 2 * 10^4
  • 1 <= paths[i].length <= 500
  • 1 <= paths[i][j].length <= 10
  • 1 <= sum(paths[i][j].length) <= 2 * 10^5
  • path[i][j] consists of lowercase English letters.
  • No two paths lead to the same folder.
  • For any folder not at the root level, its parent folder will also be in the input.

Solution

Method 1 – TrieNode Serialization and Marking

Intuition

Build a trie for the folder structure. Serialize each subtree so identical subtrees have the same string. Count serializations to find duplicates, then mark all duplicate subtrees for deletion. Finally, collect all remaining paths.

Approach

  1. Build a trie from the paths, each node representing a folder.
  2. Serialize each subtree (post-order), counting each serialization in a hash map.
  3. Mark all nodes whose serialization occurs more than once for deletion.
  4. Collect all remaining paths by traversing the trie and skipping deleted nodes.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <vector>
#include <string>
#include <unordered_map>
#include <algorithm>
#include <numeric>
using namespace std;
class Solution {
public:
    struct TrieNode {
        unordered_map<string, TrieNode*> children;
        string name;
        bool toDelete = false;
    };
    unordered_map<string, int> freq;
    void insert(TrieNode* root, const vector<string>& path) {
        TrieNode* node = root;
        for (const string& folder : path) {
            if (!node->children.count(folder)) {
                node->children[folder] = new TrieNode();
                node->children[folder]->name = folder;
            }
            node = node->children[folder];
        }
    }
    string serialize(TrieNode* node) {
        if (node->children.empty()) return "";
        vector<string> subs;
        for (auto& [name, child] : node->children) {
            subs.push_back(name + "(" + serialize(child) + ")");
        }
        sort(subs.begin(), subs.end());
        string serial = accumulate(subs.begin(), subs.end(), string());
        freq[serial]++;
        return serial;
    }
    string mark(TrieNode* node) {
        if (node->children.empty()) return "";
        vector<string> subs;
        for (auto& [name, child] : node->children) {
            subs.push_back(name + "(" + mark(child) + ")");
        }
        sort(subs.begin(), subs.end());
        string serial = accumulate(subs.begin(), subs.end(), string());
        if (freq[serial] > 1) node->toDelete = true;
        return serial;
    }
    void collect(TrieNode* node, vector<string>& currentPath, vector<vector<string>>& result) {
        for (auto& [name, child] : node->children) {
            if (child->toDelete) continue;
            currentPath.push_back(name);
            result.push_back(currentPath);
            collect(child, currentPath, result);
            currentPath.pop_back();
        }
    }
    vector<vector<string>> deleteDuplicateFolder(vector<vector<string>>& paths) {
        TrieNode* root = new TrieNode();
        for (const auto& path : paths) insert(root, path);
        for (auto& [name, child] : root->children) serialize(child);
        for (auto& [name, child] : root->children) mark(child);
        vector<vector<string>> result;
        vector<string> currentPath;
        collect(root, currentPath, result);
        return result;
    }
};
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import (
    "sort"
    "strings"
)
type TrieNode struct {
    children map[string]*TrieNode
    name     string
    toDelete bool
}
func insert(root *TrieNode, path []string) {
    node := root
    for _, folder := range path {
        if node.children == nil {
            node.children = map[string]*TrieNode{}
        }
        if _, ok := node.children[folder]; !ok {
            node.children[folder] = &TrieNode{name: folder}
        }
        node = node.children[folder]
    }
}
func serialize(node *TrieNode, freq map[string]int) string {
    if len(node.children) == 0 {
        return ""
    }
    subs := []string{}
    for name, child := range node.children {
        subs = append(subs, name+"("+serialize(child, freq)+")")
    }
    sort.Strings(subs)
    serial := strings.Join(subs, "")
    freq[serial]++
    return serial
}
func mark(node *TrieNode, freq map[string]int) string {
    if len(node.children) == 0 {
        return ""
    }
    subs := []string{}
    for name, child := range node.children {
        subs = append(subs, name+"("+mark(child, freq)+")")
    }
    sort.Strings(subs)
    serial := strings.Join(subs, "")
    if freq[serial] > 1 {
        node.toDelete = true
    }
    return serial
}
func collect(node *TrieNode, path []string, result *[][]string) {
    for name, child := range node.children {
        if child.toDelete {
            continue
        }
        path = append(path, name)
        *result = append(*result, append([]string{}, path...))
        collect(child, path, result)
        path = path[:len(path)-1]
    }
}
func deleteDuplicateFolder(paths [][]string) [][]string {
    root := &TrieNode{children: map[string]*TrieNode{}}
    for _, path := range paths {
        insert(root, path)
    }
    freq := map[string]int{}
    for _, child := range root.children {
        serialize(child, freq)
    }
    for _, child := range root.children {
        mark(child, freq)
    }
    result := [][]string{}
    collect(root, []string{}, &result)
    return result
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import java.util.*;
class Solution {
    static class TrieNode {
        Map<String, TrieNode> children = new HashMap<>();
        String name;
        boolean toDelete = false;
    }
    Map<String, Integer> freq = new HashMap<>();
    void insert(TrieNode root, List<String> path) {
        TrieNode node = root;
        for (String folder : path) {
            node.children.putIfAbsent(folder, new TrieNode());
            node.children.get(folder).name = folder;
            node = node.children.get(folder);
        }
    }
    String serialize(TrieNode node) {
        if (node.children.isEmpty()) return "";
        List<String> subs = new ArrayList<>();
        for (var entry : node.children.entrySet()) {
            subs.add(entry.getKey() + "(" + serialize(entry.getValue()) + ")");
        }
        Collections.sort(subs);
        String serial = String.join("", subs);
        freq.put(serial, freq.getOrDefault(serial, 0) + 1);
        return serial;
    }
    String mark(TrieNode node) {
        if (node.children.isEmpty()) return "";
        List<String> subs = new ArrayList<>();
        for (var entry : node.children.entrySet()) {
            subs.add(entry.getKey() + "(" + mark(entry.getValue()) + ")");
        }
        Collections.sort(subs);
        String serial = String.join("", subs);
        if (freq.getOrDefault(serial, 0) > 1) node.toDelete = true;
        return serial;
    }
    void collect(TrieNode node, List<String> currentPath, List<List<String>> result) {
        for (var entry : node.children.entrySet()) {
            TrieNode child = entry.getValue();
            if (child.toDelete) continue;
            currentPath.add(entry.getKey());
            result.add(new ArrayList<>(currentPath));
            collect(child, currentPath, result);
            currentPath.remove(currentPath.size() - 1);
        }
    }
    public List<List<String>> deleteDuplicateFolder(List<List<String>> paths) {
        TrieNode root = new TrieNode();
        for (List<String> path : paths) insert(root, path);
        for (var entry : root.children.entrySet()) serialize(entry.getValue());
        for (var entry : root.children.entrySet()) mark(entry.getValue());
        List<List<String>> result = new ArrayList<>();
        collect(root, new ArrayList<>(), result);
        return result;
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
class Solution {
    class TrieNode {
        val children = mutableMapOf<String, TrieNode>()
        var name: String = ""
        var toDelete = false
    }
    val freq = mutableMapOf<String, Int>()
    fun insert(root: TrieNode, path: List<String>) {
        var node = root
        for (folder in path) {
            node.children.putIfAbsent(folder, TrieNode())
            node.children[folder]!!.name = folder
            node = node.children[folder]!!
        }
    }
    fun serialize(node: TrieNode): String {
        if (node.children.isEmpty()) return ""
        val subs = node.children.map { it.key + "(" + serialize(it.value) + ")" }.sorted()
        val serial = subs.joinToString("")
        freq[serial] = freq.getOrDefault(serial, 0) + 1
        return serial
    }
    fun mark(node: TrieNode): String {
        if (node.children.isEmpty()) return ""
        val subs = node.children.map { it.key + "(" + mark(it.value) + ")" }.sorted()
        val serial = subs.joinToString("")
        if ((freq[serial] ?: 0) > 1) node.toDelete = true
        return serial
    }
    fun collect(node: TrieNode, currentPath: MutableList<String>, result: MutableList<List<String>>) {
        for ((name, child) in node.children) {
            if (child.toDelete) continue
            currentPath.add(name)
            result.add(currentPath.toList())
            collect(child, currentPath, result)
            currentPath.removeAt(currentPath.size - 1)
        }
    }
    fun deleteDuplicateFolder(paths: List<List<String>>): List<List<String>> {
        val root = TrieNode()
        for (path in paths) insert(root, path)
        for ((_, child) in root.children) serialize(child)
        for ((_, child) in root.children) mark(child)
        val result = mutableListOf<List<String>>()
        collect(root, mutableListOf(), result)
        return result
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class Solution:
    class TrieNode:
        def __init__(self):
            self.children = {}
            self.name = ''
            self.toDelete = False
    def insert(self, root, path):
        node = root
        for folder in path:
            if folder not in node.children:
                node.children[folder] = self.TrieNode()
                node.children[folder].name = folder
            node = node.children[folder]
    def serialize(self, node, freq):
        if not node.children:
            return ''
        subs = [name + '(' + self.serialize(child, freq) + ')' for name, child in node.children.items()]
        subs.sort()
        serial = ''.join(subs)
        freq[serial] = freq.get(serial, 0) + 1
        return serial
    def mark(self, node, freq):
        if not node.children:
            return ''
        subs = [name + '(' + self.mark(child, freq) + ')' for name, child in node.children.items()]
        subs.sort()
        serial = ''.join(subs)
        if freq.get(serial, 0) > 1:
            node.toDelete = True
        return serial
    def collect(self, node, currentPath, result):
        for name, child in node.children.items():
            if child.toDelete:
                continue
            currentPath.append(name)
            result.append(currentPath[:])
            self.collect(child, currentPath, result)
            currentPath.pop()
    def deleteDuplicateFolder(self, paths: list[list[str]]) -> list[list[str]]:
        root = self.TrieNode()
        for path in paths:
            self.insert(root, path)
        freq = {}
        for child in root.children.values():
            self.serialize(child, freq)
        for child in root.children.values():
            self.mark(child, freq)
        result = []
        self.collect(root, [], result)
        return result
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
use std::collections::HashMap;
struct TrieNode {
    children: HashMap<String, Box<TrieNode>>,
    name: String,
    to_delete: bool,
}
impl TrieNode {
    fn new() -> Self {
        TrieNode { children: HashMap::new(), name: String::new(), to_delete: false }
    }
}
impl Solution {
    pub fn delete_duplicate_folder(paths: Vec<Vec<String>>) -> Vec<Vec<String>> {
        fn insert(root: &mut TrieNode, path: &[String]) {
            let mut node = root;
            for folder in path {
                node = node.children.entry(folder.clone()).or_insert_with(|| Box::new(TrieNode::new()));
                node.name = folder.clone();
            }
        }
        fn serialize(node: &mut TrieNode, freq: &mut HashMap<String, i32>) -> String {
            if node.children.is_empty() { return String::new(); }
            let mut subs: Vec<String> = node.children.iter_mut().map(|(name, child)| format!("{}({})", name, serialize(child, freq))).collect();
            subs.sort();
            let serial = subs.concat();
            *freq.entry(serial.clone()).or_insert(0) += 1;
            serial
        }
        fn mark(node: &mut TrieNode, freq: &HashMap<String, i32>) -> String {
            if node.children.is_empty() { return String::new(); }
            let mut subs: Vec<String> = node.children.iter_mut().map(|(name, child)| format!("{}({})", name, mark(child, freq))).collect();
            subs.sort();
            let serial = subs.concat();
            if *freq.get(&serial).unwrap_or(&0) > 1 { node.to_delete = true; }
            serial
        }
        fn collect(node: &TrieNode, current_path: &mut Vec<String>, result: &mut Vec<Vec<String>>) {
            for (name, child) in &node.children {
                if child.to_delete { continue; }
                current_path.push(name.clone());
                result.push(current_path.clone());
                collect(child, current_path, result);
                current_path.pop();
            }
        }
        let mut root = TrieNode::new();
        for path in &paths { insert(&mut root, path); }
        let mut freq = HashMap::new();
        for child in root.children.values_mut() { serialize(child, &mut freq); }
        for child in root.children.values_mut() { mark(child, &freq); }
        let mut result = vec![];
        let mut current_path = vec![];
        collect(&root, &mut current_path, &mut result);
        result
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
class Solution {
    deleteDuplicateFolder(paths: string[][]): string[][] {
        class TrieNode {
            children: Record<string, TrieNode> = {};
            name: string = '';
            toDelete: boolean = false;
        }
        function insert(root: TrieNode, path: string[]) {
            let node = root;
            for (const folder of path) {
                if (!(folder in node.children)) {
                    node.children[folder] = new TrieNode();
                    node.children[folder].name = folder;
                }
                node = node.children[folder];
            }
        }
        function serialize(node: TrieNode, freq: Map<string, number>): string {
            if (Object.keys(node.children).length === 0) return '';
            const subs: string[] = [];
            for (const name in node.children) {
                subs.push(name + '(' + serialize(node.children[name], freq) + ')');
            }
            subs.sort();
            const serial = subs.join('');
            freq.set(serial, (freq.get(serial) ?? 0) + 1);
            return serial;
        }
        function mark(node: TrieNode, freq: Map<string, number>): string {
            if (Object.keys(node.children).length === 0) return '';
            const subs: string[] = [];
            for (const name in node.children) {
                subs.push(name + '(' + mark(node.children[name], freq) + ')');
            }
            subs.sort();
            const serial = subs.join('');
            if ((freq.get(serial) ?? 0) > 1) node.toDelete = true;
            return serial;
        }
        function collect(node: TrieNode, currentPath: string[], result: string[][]) {
            for (const name in node.children) {
                const child = node.children[name];
                if (child.toDelete) continue;
                currentPath.push(name);
                result.push([...currentPath]);
                collect(child, currentPath, result);
                currentPath.pop();
            }
        }
        const root = new TrieNode();
        for (const path of paths) insert(root, path);
        const freq = new Map<string, number>();
        for (const name in root.children) serialize(root.children[name], freq);
        for (const name in root.children) mark(root.children[name], freq);
        const result: string[][] = [];
        collect(root, [], result);
        return result;
    }
}

Complexity

  • ⏰ Time complexity: O(N * L * log L) — N = number of paths, L = average path length, due to trie construction and sorting children.
  • 🧺 Space complexity: O(N * L) — for the trie and hash maps.