We have already discussed - Calculate Hash Code for String.
Definition
Rolling hash is used to avoid rehashing the entire string when calculating hash values of the substrings of a given string. In a rolling hash, the new hash value is rapidly calculated using only the old hash value. This allows two strings to be compared in constant time.
Example
Consider the string abcd
and we need to find the hash values of substrings of this string with length 3, i.e., abc
and bcd
.
Keeping Smallest Power at the End
So,the hash value of first substring,H(abc) :-
H(abc) => a*(5^2) + b*(5^1) + c*(5^0)
= 97*25 + 98*5 + 99*1 = 3014
And the hash value of the second substring,H(bcd) :-
H(bcd) => b*(5^2) + c*(5^1) + d*(5^0)
= 98*25 + 99*5 + 100*1 = 3045
So, we are doing the calculation for “bc” chars in bcd
again.
Rolling hash works on the priciple that while finding the hash value of the next substring, the window is moved forward only by dropping one character and adding another.In this case character ‘a’ is removed from substring abc and character ’d’ is added next to it to get the substring bcd.
So, we do not need to rehash the string again.Instead, we can substract the hash code corresponding to the first character from the first hash value,multiply the result by the considered prime number and add the hash code corresponding to the next character to it.
For instance in this case,
H(bcd)=(H(abc)-a*(5^2))*5 + d*(5^0)=(3014-97*25)*5 + 100*1 = 3045
Keeping Smallest Power at the Start
So,the hash value of first substring,H(abc) :-
H(abc) => a*(5^0) + b*(5^1) + c*(5^2)
= 97*1 + 98*5 + 99*25 = 3062
And the hash value of the second substring,H(bcd) :-
H(bcd) => b*(5^0) + c*(5^1) + d*(5^2)
= 98*1 + 99*5 + 100*25 = 3093
So, we are doing the calculation for “bc” chars in bcd
again. Using rolling hash:
H(bcd)=(H(abc)-a*(5^0)) / 5 + d*(5^2)=(3062-97*1) / 5 + 100*25 = 3093
Note: We have to divide by base here.
Decision
[!Decision] Keep Smaller Power on Base at end When we calculate rolling hash, we don’t have to worry about division, and we use multiplicaton everywhere.
We can use the variant where we keep the smaller power at the end, as then when
Formula
In general,the hash H can be defined as:-
H=( c1 * a ^ k-1 + c2 * a ^ k-2 + c3 * a ^ k-3. . . . + c_k * a0 ) % m
where a is a constant, c1,c2, … ck are the input characters and m is a large prime number, since the probability of two random strings colliding is about ≈ 1/m.
Then, the hash value of next substring,Hnxt using rolling hash can be defined as:-
H_nxt=( ( H - c1 * a ^ k-1 ) * a + c_k+1_ * a0 ) % m
Code
Java
public class Solution {
private final int p = 31;
private final int m = (1 << 31) - 1;
// Utility function to calculate a^b % m
public long power(long a, int b) {
long result = 1;
while (b > 0) {
if ((b & 1) != 0) {
result = (result * a) % m;
}
a = (a * a) % m;
b >>= 1;
}
return result;
}
// Function to calculate the initial hash of a substring
public int hashCode(String s) {
int code = 0;
for (int i = 0; i < s.length(); i++) {
code = (int) ((code + (s.charAt(i) * power(p, s.length() - i - 1)) % m) % m);
}
return code;
}
// Function to calculate the rolling hash
public int rollingHashCode(int prevHash, char left, char right, long windowPower) {
return (int) ((((prevHash - (left * windowPower) % m) + m) % m * p % m + right) % m);
}
public static void main(String[] args) {
Solution solution = new Solution();
String text = "abcd";
String pattern = "bcd";
int windowLength = pattern.length();
long windowPower = solution.power(solution.p, windowLength - 1);
int initialHash = solution.hashCode(text.substring(0, windowLength));
System.out.println("Initial Hash: " + initialHash); // Output: hash of "abc"
int rollingHash = solution.rollingHashCode(initialHash, text.charAt(0), text.charAt(windowLength), windowPower);
System.out.println("Rolling Hash: " + rollingHash); // Output: hash of "bcd"
}
}
Python
class Solution:
def __init__(self):
self.p = 31
self.m = (1 << 31) - 1
def power(self, a: int, b: int) -> int:
result = 1
while b > 0:
if b & 1:
result = (result * a) % self.m
a = (a * a) % self.m
b >>= 1
return result
def hash_code(self, s: str) -> int:
code = 0
for i in range(len(s)):
code = (code + (ord(s[i]) * self.power(self.p, len(s) - i - 1)) % self.m) % self.m
return code
def rolling_hash_code(self, prev_hash: int, left: str, right: str, window_power: int) -> int:
new_hash = (((prev_hash - ord(left) * window_power % self.m) + self.m) % self.m *
self.p % self.m + ord(right)) % self.m
return new_hash
# Example usage
sol = Solution()
text = "abcd"
pattern = "bcd"
window_length = len(pattern)
window_power = sol.power(sol.p, window_length - 1)
initial_hash = sol.hash_code(text[:window_length])
print("Initial Hash:", initial_hash) # Output: hash of "abc"
rolling_hash = sol.rolling_hash_code(initial_hash, text[0], text[window_length], window_power)
print("Rolling Hash:", rolling_hash) # Output: hash of "bcd"
Note: The prime number p
and the modulus m
are carefully chosen to minimize hash collisions and overflow, respectively.
Window power = (p ^ size of window -1)
Applications
Rabin fingerprint is an implementation of a rolling hash algorithm.: #Rabin Karp Algorithm with Rolling Hash