Like the Naive Algorithm, the Rabin-Karp algorithm slides the pattern over the text one character at a time. However, unlike the Naive algorithm, the Rabin-Karp algorithm first compares the hash value of the pattern with the hash value of the current substring in the text. If the hash values match, it then proceeds to match the characters individually. Thus, the Rabin-Karp algorithm computes hash values for the following:
The pattern itself.
All substrings of the text with length equal to the pattern.
To efficiently compute hash values for all substrings of the text of size m, a suitable hash function is essential.
The concept is straightforward, but it requires a hash function that provides unique hashes for different substrings. Such a hash function can use ASCII codes for characters, but care must be taken to ensure it supports multiple languages.
The hash function can be based on various criteria, such as converting ASCII characters to numbers or other methods. The main requirement is to convert a string (pattern) into a hash that is quicker to compare. For example, consider the string “hello world” with a hash value of hash(‘hello world’) = 12345. If hash(‘he’) = 1, it indicates that the pattern “he” is present in the text “hello world”. At each step, we extract a substring from the text with the same length as the pattern, compute its hash, and directly compare it to the pattern’s hash, as demonstrated in the image.
Now lets use use |string hash calculation to write simple rabin karp algorithm without rolling hash. In this basic example, we will use a simple hash function to convert characters into integers and then calculate the hash values for the pattern and the text’s substrin
publicclassSolution {
// Function to create a hash table and compute hash valueprivateinthashString(String str, int len) {
Map<Character, Integer> hashTable =new HashMap<>();
hashTable.put('h', 1);
hashTable.put('e', 2);
hashTable.put('l', 3);
hashTable.put('o', 4);
hashTable.put('w', 5);
hashTable.put('r', 6);
hashTable.put('d', 7);
int hash = 0;
for (int i = 0; i < len; i++) {
hash = hash * 10 + hashTable.getOrDefault(str.charAt(i), 0);
}
return hash;
}
publicintrabinKarp(String text, String pattern) {
int n = text.length();
int m = pattern.length();
if (m > n) {
return-1;
}
int patternHash = hashString(pattern, m);
int textHash = hashString(text.substring(0, m), m);
for (int i = 0; i <= n - m; i++) {
if (patternHash == textHash) {
if (text.substring(i, i + m).equals(pattern)) {
return i;
}
}
if (i < n - m) {
textHash = hashString(text.substring(i + 1, i + 1 + m), m);
}
}
return-1;
}
}
classSolution:
def __init__(self):
self.hash_table = {
'h': 1,
'e': 2,
'l': 3,
'o': 4,
'w': 5,
'r': 6,
'd': 7 }
defhash_string(self, s: str, length: int) -> int:
h =0for i in range(length):
h = h *10+ self.hash_table.get(s[i], 0)
return h
defrabin_karp(self, text: str, pattern: str) -> int:
n = len(text)
m = len(pattern)
if m > n:
return-1 pattern_hash = self.hash_string(pattern, m)
text_hash = self.hash_string(text[:m], m)
for i in range(n - m +1):
if pattern_hash == text_hash:
if text[i:i + m] == pattern:
return i
if i < n - m:
text_hash = self.hash_string(text[i +1:i +1+ m], m)
return-1
⏰ Time complexity: O(m * n), where n is the length of the text and m is the length of the pattern. This is due to the repeated calculation of the hash for each substring.
🧺 Space complexity: O(1), as the extra space used does not depend on the input size.
We have already discussed some part of the problem here: #Algorithm without rolling hash. The main drawback of the previous solution was having to recalculate the hash for each substring. This led to time complexity similar to the naive approach.
publicclassRabinKarp {
privatefinal String pat; // pattern to search forprivatefinallong patHash; // hash value of the patternprivatefinalint m; // length of the patternprivatestaticfinallong R = 256; // radixprivatestaticfinallong Q = (long) 1e9 + 7; // large prime for modulus to avoid overflowprivatefinallong windowPower; // R^(m-1) % QpublicRabinKarp(String pat) {
this.pat= pat;
this.m= pat.length();
this.patHash= hash(this.pat);
// precompute R^(m-1) % Q for use in removing leading digitlong temp = 1;
for (int i = 1; i <= m - 1; i++) {
temp = (R * temp) % Q;
}
windowPower = temp;
}
// Hash function for a stringprivatelonghash(String s) {
long h = 0;
for (int i = 0; i < s.length(); i++) {
h = (h * R + s.charAt(i)) % Q;
}
return h;
}
// Compute next hash value for the rolling hashprivatelonggetNextHash(long currHash, char prevCh, char currCh) {
long nextHash = currHash;
nextHash = (nextHash + Q - (windowPower * prevCh) % Q) % Q;
nextHash = (nextHash * R + currCh) % Q;
return nextHash;
}
// Check if the pattern matches the text starting from index iprivatebooleancheck(String txt, int i) {
for (int j = 0; j < m; j++) {
if (pat.charAt(j) != txt.charAt(i + j)) {
returnfalse;
}
}
returntrue;
}
// Search for the pattern in the given textpublicintsearch(String txt) {
int n = txt.length();
if (n < m) {
return-1;
}
long txtHash = hash(txt.substring(0, m));
// Check for match at offset 0if ((patHash == txtHash) && check(txt, 0)) {
return 0;
}
// Check for hash match; if hash match, confirm with exact matchint j = m - 1;
for (int i = 1; i <= n - m; i++) {
txtHash = getNextHash(txtHash, txt.charAt(i - 1), txt.charAt(i + j));
if ((patHash == txtHash) && check(txt, i)) {
return i;
}
}
// No match foundreturn-1;
}
}
classRabinKarp:
R =256# radix Q = int(1e9+7) # large prime for modulus to avoid overflowdef __init__(self, pat: str):
self.pat = pat
self.m = len(pat)
self.patHash = self.hash(pat)
self.windowPower = self.compute_window_power()
defcompute_window_power(self):
temp =1for i in range(1, self.m):
temp = (RabinKarp.R * temp) % RabinKarp.Q
return temp
defhash(self, s: str):
h =0for char in s:
h = (h * RabinKarp.R + ord(char)) % RabinKarp.Q
return h
defget_next_hash(self, curr_hash, prev_ch, curr_ch):
next_hash = curr_hash
next_hash = (next_hash + RabinKarp.Q - (self.windowPower * ord(prev_ch)) % RabinKarp.Q) % RabinKarp.Q
next_hash = (next_hash * RabinKarp.R + ord(curr_ch)) % RabinKarp.Q
return next_hash
defcheck(self, txt: str, i: int):
for j in range(self.m):
if self.pat[j] != txt[i + j]:
returnFalsereturnTruedefsearch(self, txt: str):
n = len(txt)
if n < self.m:
return-1 txtHash = self.hash(txt[:self.m])
# Check for match at offset 0if self.patHash == txtHash and self.check(txt, 0):
return0# Check for hash match; if hash match, confirm with exact matchfor i in range(1, n - self.m +1):
txtHash = self.get_next_hash(txtHash, txt[i -1], txt[i + self.m -1])
if self.patHash == txtHash and self.check(txt, i):
return i
# No match foundreturn-1# Example usagerk = RabinKarp("needle")
print(rk.search("haystack with a needle inside")) # Output will be the starting index of "needle" in the haystack string
The average and best-case running time of the Rabin-Karp algorithm is O(n + m), but the worst-case time is O(n * m). This is because the algorithm could find a false positive for each of the n positions, and it requires up to m comparisons to verify the match.
With a reasonably good hash function, such false positives are unlikely to occur often. However, it is theoretically possible to construct a query where false positives cause worst-case performance.
Therefore, while Rabin-Karp has an expected time complexity of O(n), its worst-case time complexity remains O(n * m). Since m is typically less than or equal to n, O(n + m) is fundamentally the same as O(n).
There are several string matching algorithms with better performance than O(n*m); however, Rabin-Karp outperforms the naive brute force method in the worst-case scenario.
It can be as slow as brute force in practice and requires additional memory.