Problem

Due to a bug, there are many duplicate folders in a file system. You are given a 2D array paths, where paths[i] is an array representing an absolute path to the ith folder in the file system.

  • For example, ["one", "two", "three"] represents the path "/one/two/three".

Two folders (not necessarily on the same level) are identical if they contain the same non-empty set of identical subfolders and underlying subfolder structure. The folders do not need to be at the root level to be identical. If two or more folders are identical , then mark the folders as well as all their subfolders.

  • For example, folders "/a" and "/b" in the file structure below are identical. They (as well as their subfolders) should all be marked:
  • /a
  • /a/x
  • /a/x/y
  • /a/z
  • /b
  • /b/x
  • /b/x/y
  • /b/z
    • However, if the file structure also included the path "/b/w", then the folders "/a" and "/b" would not be identical. Note that "/a/x" and "/b/x" would still be considered identical even with the added folder.

Once all the identical folders and their subfolders have been marked, the file system will delete all of them. The file system only runs the deletion once, so any folders that become identical after the initial deletion are not deleted.

Return the 2D arrayans containing the paths of theremaining folders after deleting all the marked folders. The paths may be returned in any order.

Examples

Example 1

1
2
3
4
5
6
7
8

![](https://assets.leetcode.com/uploads/2021/07/19/lc-dupfolder1.jpg)

Input: paths = [["a"],["c"],["d"],["a","b"],["c","b"],["d","a"]]
Output: [["d"],["d","a"]]
Explanation: The file structure is as shown.
Folders "/a" and "/c" (and their subfolders) are marked for deletion because they both contain an empty
folder named "b".

Example 2

1
2
3
4
5
6
7
8

![](https://assets.leetcode.com/uploads/2021/07/19/lc-dupfolder2.jpg)

Input: paths = [["a"],["c"],["a","b"],["c","b"],["a","b","x"],["a","b","x","y"],["w"],["w","y"]]
Output: [["c"],["c","b"],["a"],["a","b"]]
Explanation: The file structure is as shown. 
Folders "/a/b/x" and "/w" (and their subfolders) are marked for deletion because they both contain an empty folder named "y".
Note that folders "/a" and "/c" are identical after the deletion, but they are not deleted because they were not marked beforehand.

Example 3

1
2
3
4
5
6
7

![](https://assets.leetcode.com/uploads/2021/07/19/lc-dupfolder3.jpg)

Input: paths = [["a","b"],["c","d"],["c"],["a"]]
Output: [["c"],["c","d"],["a"],["a","b"]]
Explanation: All folders are unique in the file system.
Note that the returned array can be in a different order as the order does not matter.

Constraints

  • 1 <= paths.length <= 2 * 10^4
  • 1 <= paths[i].length <= 500
  • 1 <= paths[i][j].length <= 10
  • 1 <= sum(paths[i][j].length) <= 2 * 10^5
  • path[i][j] consists of lowercase English letters.
  • No two paths lead to the same folder.
  • For any folder not at the root level, its parent folder will also be in the input.

Solution

Method 1 – Trie Serialization and Hashing

Intuition

To detect duplicate folders, we can build a trie representing the folder structure. By serializing each subtree, we can identify identical subtrees (duplicate folders) using a hash map. We mark all nodes with duplicate serializations for deletion, then collect all remaining paths.

Approach

  1. Build a trie from the list of paths, where each node represents a folder.
  2. Serialize each subtree (post-order traversal), and use a hash map to count the frequency of each serialization.
  3. In a second traversal, mark all nodes whose serialization occurs more than once for deletion.
  4. Collect all remaining paths by traversing the trie and skipping deleted nodes.

Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
class Solution {
    struct Trie {
        unordered_map<string, Trie*> ch;
        bool del = false;
    };
    unordered_map<string, int> mp;
    string serialize(Trie* node) {
        if (node->ch.empty()) return "";
        vector<pair<string, string>> v;
        for (auto& [k, child] : node->ch) {
            v.push_back({k, serialize(child)});
        }
        sort(v.begin(), v.end());
        string s;
        for (auto& [k, sub] : v) {
            s += "(" + k + sub + ")";
        }
        mp[s]++;
        return s;
    }
    void mark(Trie* node) {
        if (node->ch.empty()) return;
        vector<pair<string, string>> v;
        for (auto& [k, child] : node->ch) {
            v.push_back({k, serialize(child)});
        }
        sort(v.begin(), v.end());
        string s;
        for (auto& [k, sub] : v) {
            s += "(" + k + sub + ")";
        }
        if (mp[s] > 1) node->del = true;
        for (auto& [k, child] : node->ch) mark(child);
    }
    void collect(Trie* node, vector<string>& path, vector<vector<string>>& ans) {
        for (auto& [k, child] : node->ch) {
            if (child->del) continue;
            path.push_back(k);
            ans.push_back(path);
            collect(child, path, ans);
            path.pop_back();
        }
    }
public:
    vector<vector<string>> deleteDuplicateFolder(vector<vector<string>>& paths) {
        Trie* root = new Trie();
        for (auto& p : paths) {
            Trie* cur = root;
            for (auto& s : p) cur = cur->ch[s] = cur->ch.count(s) ? cur->ch[s] : new Trie();
        }
        serialize(root);
        mark(root);
        vector<vector<string>> ans;
        vector<string> path;
        collect(root, path, ans);
        return ans;
    }
};
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import "sort"
type Trie struct {
    ch map[string]*Trie
    del bool
}
func serialize(node *Trie, mp map[string]int) string {
    if len(node.ch) == 0 {
        return ""
    }
    v := make([][2]string, 0, len(node.ch))
    for k, child := range node.ch {
        v = append(v, [2]string{k, serialize(child, mp)})
    }
    sort.Slice(v, func(i, j int) bool { return v[i][0] < v[j][0] })
    s := ""
    for _, kv := range v {
        s += "(" + kv[0] + kv[1] + ")"
    }
    mp[s]++
    return s
}
func mark(node *Trie, mp map[string]int) {
    if len(node.ch) == 0 {
        return
    }
    v := make([][2]string, 0, len(node.ch))
    for k, child := range node.ch {
        v = append(v, [2]string{k, serialize(child, mp)})
    }
    sort.Slice(v, func(i, j int) bool { return v[i][0] < v[j][0] })
    s := ""
    for _, kv := range v {
        s += "(" + kv[0] + kv[1] + ")"
    }
    if mp[s] > 1 {
        node.del = true
    }
    for _, child := range node.ch {
        mark(child, mp)
    }
}
func collect(node *Trie, path []string, ans *[][]string) {
    for k, child := range node.ch {
        if child.del {
            continue
        }
        path = append(path, k)
        *ans = append(*ans, append([]string{}, path...))
        collect(child, path, ans)
        path = path[:len(path)-1]
    }
}
func deleteDuplicateFolder(paths [][]string) [][]string {
    root := &Trie{ch: map[string]*Trie{}}
    for _, p := range paths {
        cur := root
        for _, s := range p {
            if cur.ch[s] == nil {
                cur.ch[s] = &Trie{ch: map[string]*Trie{}}
            }
            cur = cur.ch[s]
        }
    }
    mp := map[string]int{}
    serialize(root, mp)
    mark(root, mp)
    ans := [][]string{}
    collect(root, []string{}, &ans)
    return ans
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class Solution {
    static class Trie {
        Map<String, Trie> ch = new HashMap<>();
        boolean del = false;
    }
    Map<String, Integer> mp = new HashMap<>();
    String serialize(Trie node) {
        if (node.ch.isEmpty()) return "";
        List<Map.Entry<String, String>> v = new ArrayList<>();
        for (var e : node.ch.entrySet()) {
            v.add(Map.entry(e.getKey(), serialize(e.getValue())));
        }
        v.sort(Map.Entry.comparingByKey());
        StringBuilder sb = new StringBuilder();
        for (var e : v) sb.append("(").append(e.getKey()).append(e.getValue()).append(")");
        String s = sb.toString();
        mp.put(s, mp.getOrDefault(s, 0) + 1);
        return s;
    }
    void mark(Trie node) {
        if (node.ch.isEmpty()) return;
        List<Map.Entry<String, String>> v = new ArrayList<>();
        for (var e : node.ch.entrySet()) {
            v.add(Map.entry(e.getKey(), serialize(e.getValue())));
        }
        v.sort(Map.Entry.comparingByKey());
        StringBuilder sb = new StringBuilder();
        for (var e : v) sb.append("(").append(e.getKey()).append(e.getValue()).append(")");
        String s = sb.toString();
        if (mp.getOrDefault(s, 0) > 1) node.del = true;
        for (var e : node.ch.values()) mark(e);
    }
    void collect(Trie node, List<String> path, List<List<String>> ans) {
        for (var e : node.ch.entrySet()) {
            if (e.getValue().del) continue;
            path.add(e.getKey());
            ans.add(new ArrayList<>(path));
            collect(e.getValue(), path, ans);
            path.remove(path.size() - 1);
        }
    }
    public List<List<String>> deleteDuplicateFolder(List<List<String>> paths) {
        Trie root = new Trie();
        for (var p : paths) {
            Trie cur = root;
            for (var s : p) cur = cur.ch.computeIfAbsent(s, k -> new Trie());
        }
        serialize(root);
        mark(root);
        List<List<String>> ans = new ArrayList<>();
        collect(root, new ArrayList<>(), ans);
        return ans;
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class Solution {
    class Trie {
        val ch = mutableMapOf<String, Trie>()
        var del = false
    }
    val mp = mutableMapOf<String, Int>()
    fun serialize(node: Trie): String {
        if (node.ch.isEmpty()) return ""
        val v = node.ch.map { it.key to serialize(it.value) }.sortedBy { it.first }
        val s = v.joinToString("") { "(${it.first}${it.second})" }
        mp[s] = mp.getOrDefault(s, 0) + 1
        return s
    }
    fun mark(node: Trie) {
        if (node.ch.isEmpty()) return
        val v = node.ch.map { it.key to serialize(it.value) }.sortedBy { it.first }
        val s = v.joinToString("") { "(${it.first}${it.second})" }
        if ((mp[s] ?: 0) > 1) node.del = true
        for (child in node.ch.values) mark(child)
    }
    fun collect(node: Trie, path: MutableList<String>, ans: MutableList<List<String>>) {
        for ((k, child) in node.ch) {
            if (child.del) continue
            path.add(k)
            ans.add(path.toList())
            collect(child, path, ans)
            path.removeAt(path.size - 1)
        }
    }
    fun deleteDuplicateFolder(paths: List<List<String>>): List<List<String>> {
        val root = Trie()
        for (p in paths) {
            var cur = root
            for (s in p) cur = cur.ch.getOrPut(s) { Trie() }
        }
        serialize(root)
        mark(root)
        val ans = mutableListOf<List<String>>()
        collect(root, mutableListOf(), ans)
        return ans
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class Solution:
    class Trie:
        def __init__(self):
            self.ch = {}
            self.del_ = False
    def deleteDuplicateFolder(self, paths: list[list[str]]) -> list[list[str]]:
        root = self.Trie()
        for p in paths:
            cur = root
            for s in p:
                if s not in cur.ch:
                    cur.ch[s] = self.Trie()
                cur = cur.ch[s]
        mp: dict[str, int] = {}
        def serialize(node) -> str:
            if not node.ch:
                return ''
            v = [(k, serialize(child)) for k, child in node.ch.items()]
            v.sort()
            s = ''.join(f'({k}{sub})' for k, sub in v)
            mp[s] = mp.get(s, 0) + 1
            return s
        serialize(root)
        def mark(node):
            if not node.ch:
                return
            v = [(k, serialize(child)) for k, child in node.ch.items()]
            v.sort()
            s = ''.join(f'({k}{sub})' for k, sub in v)
            if mp.get(s, 0) > 1:
                node.del_ = True
            for child in node.ch.values():
                mark(child)
        mark(root)
        ans: list[list[str]] = []
        def collect(node, path):
            for k, child in node.ch.items():
                if child.del_:
                    continue
                path.append(k)
                ans.append(path[:])
                collect(child, path)
                path.pop()
        collect(root, [])
        return ans
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
use std::collections::HashMap;
struct Trie {
    ch: HashMap<String, Box<Trie>>,
    del: bool,
}
impl Trie {
    fn new() -> Self {
        Trie { ch: HashMap::new(), del: false }
    }
}
impl Solution {
    pub fn delete_duplicate_folder(paths: Vec<Vec<String>>) -> Vec<Vec<String>> {
        fn serialize(node: &mut Trie, mp: &mut HashMap<String, i32>) -> String {
            if node.ch.is_empty() { return String::new(); }
            let mut v: Vec<_> = node.ch.iter_mut().map(|(k, child)| (k.clone(), serialize(child, mp))).collect();
            v.sort();
            let s = v.iter().map(|(k, sub)| format!("({}{})", k, sub)).collect::<String>();
            *mp.entry(s.clone()).or_insert(0) += 1;
            s
        }
        fn mark(node: &mut Trie, mp: &HashMap<String, i32>) {
            if node.ch.is_empty() { return; }
            let mut v: Vec<_> = node.ch.iter_mut().map(|(k, child)| (k.clone(), serialize(child, &mut mp.clone()))).collect();
            v.sort();
            let s = v.iter().map(|(k, sub)| format!("({}{})", k, sub)).collect::<String>();
            if *mp.get(&s).unwrap_or(&0) > 1 { node.del = true; }
            for child in node.ch.values_mut() { mark(child, mp); }
        }
        fn collect(node: &Trie, path: &mut Vec<String>, ans: &mut Vec<Vec<String>>) {
            for (k, child) in &node.ch {
                if child.del { continue; }
                path.push(k.clone());
                ans.push(path.clone());
                collect(child, path, ans);
                path.pop();
            }
        }
        let mut root = Trie::new();
        for p in &paths {
            let mut cur = &mut root;
            for s in p {
                cur = cur.ch.entry(s.clone()).or_insert_with(|| Box::new(Trie::new()));
            }
        }
        let mut mp = HashMap::new();
        serialize(&mut root, &mut mp);
        mark(&mut root, &mp);
        let mut ans = vec![];
        let mut path = vec![];
        collect(&root, &mut path, &mut ans);
        ans
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class Solution {
    deleteDuplicateFolder(paths: string[][]): string[][] {
        class Trie {
            ch: Record<string, Trie> = {};
            del = false;
        }
        const root = new Trie();
        for (const p of paths) {
            let cur = root;
            for (const s of p) {
                if (!(s in cur.ch)) cur.ch[s] = new Trie();
                cur = cur.ch[s];
            }
        }
        const mp = new Map<string, number>();
        function serialize(node: Trie): string {
            if (Object.keys(node.ch).length === 0) return '';
            const v = Object.entries(node.ch).map(([k, child]) => [k, serialize(child)] as [string, string]);
            v.sort((a, b) => a[0].localeCompare(b[0]));
            const s = v.map(([k, sub]) => `(${k}${sub})`).join('');
            mp.set(s, (mp.get(s) ?? 0) + 1);
            return s;
        }
        serialize(root);
        function mark(node: Trie) {
            if (Object.keys(node.ch).length === 0) return;
            const v = Object.entries(node.ch).map(([k, child]) => [k, serialize(child)] as [string, string]);
            v.sort((a, b) => a[0].localeCompare(b[0]));
            const s = v.map(([k, sub]) => `(${k}${sub})`).join('');
            if ((mp.get(s) ?? 0) > 1) node.del = true;
            for (const child of Object.values(node.ch)) mark(child);
        }
        mark(root);
        const ans: string[][] = [];
        function collect(node: Trie, path: string[]) {
            for (const [k, child] of Object.entries(node.ch)) {
                if (child.del) continue;
                path.push(k);
                ans.push([...path]);
                collect(child, path);
                path.pop();
            }
        }
        collect(root, []);
        return ans;
    }
}

Complexity

  • ⏰ Time complexity: O(N * L * log L), where N is the number of paths and L is the average path length, due to trie construction and sorting children.
  • 🧺 Space complexity: O(N * L), for the trie and hash maps.