Open In App

What are Hash Functions and How to choose a good Hash Function?

Prerequisite: Hashing | Set 1 (Introduction) 

What is a Hash Function? 



A function that converts a given big phone number to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a big number or string to a small integer that can be used as the index in the hash table. 

What is meant by Good Hash Function? 



A good hash function should have the following properties: 

  1. Efficiently computable.
  2. Should uniformly distribute the keys (Each table position equally likely for each key)

For example: For phone numbers, a bad hash function is to take the first three digits. A better function is considered the last three digits. Please note that this may not be the best hash function. There may be better ways. 

Rules for choosing good hash function:

1. The hash function should be simple to compute.

2. Number of collisions should be less while placing the record in the hash table.Ideally no collision should occur. Such a function is called perfect hash function.

3. Hash function should produce such keys which will get distributed uniformly over an array.

4. The hash function should depend on every bit of the key. Thus the hash function that simply extracts the portion of a key is not suitable.

In practice, we can often employ heuristic techniques to create a hash function that performs well. Qualitative information about the distribution of the keys may be useful in this design process. In general, a hash function should depend on every single bit of the key, so that two keys that differ in only one bit or one group of bits (regardless of whether the group is at the beginning, end, or middle of the key or present throughout the key) hash into different values. Thus, a hash function that simply extracts a portion of a key is not suitable. Similarly, if two keys are simply digited or character permutations of each other (such as 139 and 319), they should also hash into different values. 

The two heuristic methods are hashing by division and hashing by multiplication which are as follows: 

  1. The mod method: 
    • In this method for creating hash functions, we map a key into one of the slots of table by taking the remainder of key divided by table_size. That is, the hash function is 
       
h(key) = key mod table_size 

i.e. key % table_size
37599 % 17 = 12
573 % 17 = 12
  1. The multiplication method: 
    • In multiplication method, we multiply the key k by a constant real number c in the range 0 < c < 1 and extract the fractional part of k * c.
    • Then we multiply this value by table_size m and take the floor of the result. It can be represented as
h(k) = floor (m * (k * c mod 1))
                     or
h(k) = floor (m * frac (k * c))
r1 * 2w + r0

where r1 = high-order word of the product
      r0 = lower order word of the product
c ~ (sqrt (5) – 1) / 2 = 0.618033988 . . .
Article Tags :