Given two strings, compute their similarity using the Jaro-Winkler distance metric. The Jaro-Winkler distance is a string similarity measure that rewards common prefixes and penalizes transpositions. It is widely used for record linkage, spelling correction, and fuzzy string matching.
Input:
s1 ="martha"s2 ="marhta"Output:
Jaro distance =0.944Jaro-Winkler distance =0.961Explanation:
The two strings have all characters matching, but two transpositions(t/h and h/t). The Jaro distance is computed as shown below, and the Jaro-Winkler distance further boosts the score due to the common prefix "mar".
The Jaro distance measures similarity by considering matching characters and transpositions. The Jaro-Winkler distance extends this by giving extra weight to common prefixes, making it especially useful when similar strings share the same start.
classSolution {
public:double jaroWinkler(string s1, string s2) {
int n1 = s1.size(), n2 = s2.size();
if (n1 ==0&& n2 ==0) return1.0;
int match_dist = max(n1, n2) /2-1;
vector<bool> s1_match(n1, false), s2_match(n2, false);
int m =0;
for (int i =0; i < n1; ++i) {
int start = max(0, i - match_dist), end = min(n2 -1, i + match_dist);
for (int j = start; j <= end; ++j) {
if (!s2_match[j] && s1[i] == s2[j]) {
s1_match[i] = s2_match[j] = true;
++m;
break;
}
}
}
if (m ==0) return0.0;
int t =0, k =0;
for (int i =0; i < n1; ++i) {
if (s1_match[i]) {
while (!s2_match[k]) ++k;
if (s1[i] != s2[k]) ++t;
++k;
}
}
double dj = (m / (double)n1 + m / (double)n2 + (m - t /2.0) / m) /3.0;
int l =0;
while (l < min(4, min(n1, n2)) && s1[l] == s2[l]) ++l;
double p =0.1;
return dj + l * p * (1- dj);
}
};
classSolution:
defjaro_winkler(self, s1: str, s2: str) -> float:
n1, n2 = len(s1), len(s2)
if n1 ==0and n2 ==0:
return1.0 match_dist = max(n1, n2) //2-1 s1_match = [False] * n1
s2_match = [False] * n2
m =0for i in range(n1):
start, end = max(0, i - match_dist), min(n2 -1, i + match_dist)
for j in range(start, end +1):
ifnot s2_match[j] and s1[i] == s2[j]:
s1_match[i] = s2_match[j] =True m +=1breakif m ==0:
return0.0 t =0 k =0for i in range(n1):
if s1_match[i]:
whilenot s2_match[k]:
k +=1if s1[i] != s2[k]:
t +=1 k +=1 dj = (m / n1 + m / n2 + (m - t /2) / m) /3 l =0while l < min(4, n1, n2) and s1[l] == s2[l]:
l +=1 p =0.1return dj + l * p * (1- dj)
⏰ Time complexity: O(n + m), where n and m are the lengths of the two strings. Each character is compared within a window, and prefix and transpositions are checked in linear time.
🧺 Space complexity: O(n + m), for the match marker arrays.