We have already discussed - Calculate Hash Code for String.

Definition

Rolling hash is used to avoid rehashing the entire string when calculating hash values of the substrings of a given string. In a rolling hash, the new hash value is rapidly calculated using only the old hash value. This allows two strings to be compared in constant time.

Example

Consider the string abcd and we need to find the hash values of substrings of this string with length 3, i.e., abc and bcd.

Keeping Smallest Power at the End

So,the hash value of first substring,H(abc) :-

H(abc) => a*(5^2) + b*(5^1) + c*(5^0) 
= 97*25 + 98*5 + 99*1 = 3014

And the hash value of the second substring,H(bcd) :-

H(bcd) => b*(5^2) + c*(5^1) + d*(5^0) 
= 98*25 + 99*5 + 100*1 = 3045

So, we are doing the calculation for “bc” chars in bcd again.

Rolling hash works on the priciple that while finding the hash value of the next substring, the window is moved forward only by dropping one character and adding another.In this case character ‘a’ is removed from substring abc and character ’d’ is added next to it to get the substring bcd.

So, we do not need to rehash the string again.Instead, we can substract the hash code corresponding to the first character from the first hash value,multiply the result by the considered prime number and add the hash code corresponding to the next character to it.

For instance in this case,

H(bcd)=(H(abc)-a*(5^2))*5 + d*(5^0)=(3014-97*25)*5 + 100*1 = 3045

Keeping Smallest Power at the Start

So,the hash value of first substring,H(abc) :-

H(abc) => a*(5^0) + b*(5^1) + c*(5^2) 
= 97*1 + 98*5 + 99*25 = 3062

And the hash value of the second substring,H(bcd) :-

H(bcd) => b*(5^0) + c*(5^1) + d*(5^2) 
= 98*1 + 99*5 + 100*25 = 3093

So, we are doing the calculation for “bc” chars in bcd again. Using rolling hash:

H(bcd)=(H(abc)-a*(5^0)) / 5 + d*(5^2)=(3062-97*1) / 5 + 100*25 = 3093

Note: We have to divide by base here.

Decision

[!Decision] Keep Smaller Power on Base at end When we calculate rolling hash, we don’t have to worry about division, and we use multiplicaton everywhere.

We can use the variant where we keep the smaller power at the end, as then when

Formula

In general,the hash H can be defined as:-

H=( c1 * a ^ k-1 + c2 * a ^ k-2 + c3 * a ^ k-3. . . . + c_k * a0 ) % m

where a is a constant, c1,c2, … ck are the input characters and m is a large prime number, since the probability of two random strings colliding is about ≈ 1/m.

Then, the hash value of next substring,Hnxt using rolling hash can be defined as:-

H_nxt=( ( H - c1 * a ^ k-1 ) * a + c_k+1_ * a0 ) % m

Code

Java
public class Solution {
    private final int p = 31;
    private final int m = (1 << 31) - 1;

    // Utility function to calculate a^b % m
    public long power(long a, int b) {
        long result = 1;
        while (b > 0) {
            if ((b & 1) != 0) {
                result = (result * a) % m;
            }
            a = (a * a) % m;
            b >>= 1;
        }
        return result;
    }

    // Function to calculate the initial hash of a substring
    public int hashCode(String s) {
        int code = 0;
        for (int i = 0; i < s.length(); i++) {
            code = (int) ((code + (s.charAt(i) * power(p, s.length() - i - 1)) % m) % m);
        }
        return code;
    }

    // Function to calculate the rolling hash
    public int rollingHashCode(int prevHash, char left, char right, long windowPower) {
        return (int) ((((prevHash - (left * windowPower) % m) + m) % m * p % m + right) % m);
    }

    public static void main(String[] args) {
        Solution solution = new Solution();
        String text = "abcd";
        String pattern = "bcd";

        int windowLength = pattern.length();
        long windowPower = solution.power(solution.p, windowLength - 1);

        int initialHash = solution.hashCode(text.substring(0, windowLength));
        System.out.println("Initial Hash: " + initialHash); // Output: hash of "abc"

        int rollingHash = solution.rollingHashCode(initialHash, text.charAt(0), text.charAt(windowLength), windowPower);
        System.out.println("Rolling Hash: " + rollingHash); // Output: hash of "bcd"
    }
}
Python
class Solution:
    def __init__(self):
        self.p = 31
        self.m = (1 << 31) - 1

    def power(self, a: int, b: int) -> int:
        result = 1
        while b > 0:
            if b & 1:
                result = (result * a) % self.m
            a = (a * a) % self.m
            b >>= 1
        return result

    def hash_code(self, s: str) -> int:
        code = 0
        for i in range(len(s)):
            code = (code + (ord(s[i]) * self.power(self.p, len(s) - i - 1)) % self.m) % self.m
        return code

    def rolling_hash_code(self, prev_hash: int, left: str, right: str, window_power: int) -> int:
        new_hash = (((prev_hash - ord(left) * window_power % self.m) + self.m) % self.m * 
                    self.p % self.m + ord(right)) % self.m
        return new_hash

# Example usage
sol = Solution()
text = "abcd"
pattern = "bcd"

window_length = len(pattern)
window_power = sol.power(sol.p, window_length - 1)

initial_hash = sol.hash_code(text[:window_length])
print("Initial Hash:", initial_hash)  # Output: hash of "abc"

rolling_hash = sol.rolling_hash_code(initial_hash, text[0], text[window_length], window_power)
print("Rolling Hash:", rolling_hash)  # Output: hash of "bcd"

Note: The prime number p and the modulus m are carefully chosen to minimize hash collisions and overflow, respectively.

Window power = (p ^ size of window -1)

Applications

Rabin fingerprint is an implementation of a rolling hash algorithm.: #Rabin Karp Algorithm with Rolling Hash