String Vector Denotation
We’ll denote our chars in string with integers:
a -> 1 g -> 7 m -> 13 s -> 19 y -> 25
b -> 2 h -> 8 n -> 14 t -> 20 z -> 26
c -> 3 i -> 9 o -> 15 u -> 21
d -> 4 j -> 10 p -> 16 v -> 22
e -> 5 k -> 11 q -> 17 w -> 23
f -> 6 l -> 12 r -> 18 x -> 24
HashCode Solution
Method 1 - Just Sum up Char Codes
One way to create hash is just take the chars in string and add them up. For eg. if we have chars in range [a-z]
, we can assign chars numbers, like a
having 1, b
having 2 and so on. So, bac
-> 2+1+3
= 6. But this will result in a lot fo collisions.
Method 2 - Use Prime Multiplication and Summation - smaller power on base in start
Let’s take prime, ( P = 11 ) for this example. Then we can generate the hashcode like this:
$$ \text{hash code} = (1st letter) \times P^0 + (2nd , letter) \times P^1 + (3rd , letter) \times P^2 + \ldots $$
Examples
Lets take prime as 11. The hash value of abc
will be:
$$ 1 \times 11^0 + 2 \times 11^1 + 3 \times 11^2 = 1 + 22 + 363 = 386 $$
Hashcode of bac
will be:
$$ 2 \times 11^0 + 1 \times 11^1 + 3 \times 11^2 = 2 + 11 + 363 = 376 $$
So, there is no collision.
Code
Java
int hashCode(String s) {
int code = 0;
int P = 11;//prime number
for (int i = 0; i<s.length; i++) {
code = code + s.charAt(i) * power(P, i);
}
return code;
}
Python
def char_to_value(ch: str) -> int:
""" Convert character to its corresponding integer value. """
return ord(ch) - ord('a') + 1
def power(x: int, y: int) -> int:
""" Calculate x raised to the power y. """
return x ** y
def hash_code(s: str) -> int:
code = 0
P = 11 # prime number
for i in range(len(s)):
code += char_to_value(s[i]) * power(P, i)
return code
Method 3 - Use Prime Multiplication and Summation - smaller power on base in end
We can also do prime multiplication from start. So, for example for abc and base 11:
$$ 1 \times 11^2 + 2 \times 11^1 + 3 \times 11^0 = 121 + 22 + 3 = 146 $$
Code
Java
int hashCode(String s) {
int code = 0;
int P = 11;//prime number
for (int i = s.length-1; i>=0; i--) {
code = code + s.charAt(i) * power(P, i);
}
return code;
}
Python
def hash_code_reverse(s: str) -> int:
code = 0
P = 11 # prime number
length = len(s)
for i in range(length):
code += char_to_value(s[i]) * power(P, length - i - 1)
return code
Handling Overflow
When adding numbers, there can be overflow for long strings or larger prime numbers. To overcome this, we can use the modulo operator.
MOD = 2^31 for integers in java
MOD = (1<<31) - 1;
Hash = Hash % MOD
But is it enough to stop the overflow. While adding all the numbers, there may occur overflow at the final step. So, we not take modulo while adding but also while multiplying.
Basically we can use following mod property:
(a+b) % m = (a%m + b%m) % m
(a*b) % m = (a%m * b%m) % m
(a/b) % m = (a * inverse of b, if exists) % m
So, we can do something like this:
(1st letter) X (prime)⁰ % MOD + (2nd letter) X (prime)¹ % MOD + (3rd letter) X (prime)² % MOD + ......
Code
Java
int hashCode(String s) {
int code = 0;
int P = 11;//prime number
int m = (1<<31) - 1;
for (int i = 0; i<s.length, i++) {
code = (code + s.charAt(i) * power(P, i) % m) % m;
}
return code;
}
Python
def hash_code_with_mod(s: str, mod: int = (1 << 31) - 1) -> int:
code = 0
P = 11 # prime number
for i in range(len(s)):
code = (code + char_to_value(s[i]) * power(P, i) % mod) % mod
return code