Problem

Two strings, X and Y, are considered similar if either they are identical or we can make them equivalent by swapping at most two letters (in distinct positions) within the string X.

For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars""rats", or "arts".

Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}.  Notice that "tars" and "arts" are in the same group even though they are not similar.  Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.

We are given a list strs of strings where every string in strs is an anagram of every other string in strs. How many groups are there?

Examples

Example 1:

Input: strs = ["tars","rats","arts","star"]
Output: 2

Example 2:

Input: strs = ["omv","ovm"]
Output: 1

Solution

Method 1 - Using Union Find Data structure

The problem is to determine the number of groups of similar strings in a list where each string is an anagram of the others. Two strings are considered similar if they are identical or can be made equivalent by swapping at most two characters.

To solve this problem, we can use depth-first search (DFS) or union-find to group the strings based on their similarities. We will use the union-find approach for this solution, also known as Disjoint Set Union (DSU).

  1. Union-Find Data Structure:
    • Initialize a parent array where each index points to itself initially.
    • Define find and union functions to manage the disjoint sets.
  2. Checking Similarity:
    • Two strings are similar if they can become identical by swapping at most two characters. We will write a helper function to check this condition.
  3. Processing the Strings:
    • Iterate through all pairs of strings. For each pair, if they are similar, union their sets.
  4. Count the Number of Groups:
    • After processing all pairs, count the number of unique parents to get the number of groups.

Code

Java
public class Solution {
    public int numSimilarGroups(String[] strs) {
        int n = strs.length;
        int[] parent = new int[n];
        for (int i = 0; i < n; i++) {
            parent[i] = i;
        }

        for (int i = 0; i < n; i++) {
            for (int j = i + 1; j < n; j++) {
                if (areSimilar(strs[i], strs[j])) {
                    union(parent, i, j);
                }
            }
        }

        int ans = 0;
        for (int i = 0; i < n; i++) {
            if (find(parent, i) == i) {
                ans++;
            }
        }

        return ans;
    }

    private boolean areSimilar(String s1, String s2) {
        int diffCount = 0;
        for (int i = 0; i < s1.length(); i++) {
            if (s1.charAt(i) != s2.charAt(i)) {
                diffCount++;
                if (diffCount > 2) {
                    return false;
                }
            }
        }
        return true;
    }

    private int find(int[] parent, int x) {
        if (parent[x] != x) {
            parent[x] = find(parent, parent[x]);
        }
        return parent[x];
    }

    private void union(int[] parent, int x, int y) {
        int rootX = find(parent, x);
        int rootY = find(parent, y);
        if (rootX != rootY) {
            parent[rootX] = rootY;
        }
    }
}
Python
class Solution:
    def numSimilarGroups(self, strs: List[str]) -> int:
        n = len(strs)
        parent = list(range(n))
        
        def find(x: int) -> int:
            if parent[x] != x:
                parent[x] = find(parent[x])
            return parent[x]
        
        def union(x: int, y: int) -> None:
            root_x = find(x)
            root_y = find(y)
            if root_x != root_y:
                parent[root_x] = root_y
        
        def are_similar(s1: str, s2: str) -> bool:
            diff_count = 0
            for a, b in zip(s1, s2):
                if a != b:
                    diff_count += 1
                    if diff_count > 2:
                        return False
            return True
        
        for i in range(n):
            for j in range(i + 1, n):
                if are_similar(strs[i], strs[j]):
                    union(i, j)
        
        ans = sum(parent[i] == i for i in range(n))
        return ans

Complexity

  • ⏰ Time complexity: O(n^2 * m), where n is the number of strings and m is the length of each string. This accounts for comparing all pairs of strings.
  • 🧺 Space complexity: O(n), for the union-find data structure.