# Hashing in Distributed Systems

Prerequisite – Hashing

A distributed system is a network that consists of autonomous computers that are connected using a distribution middleware. They help in sharing different resources and capabilities to provide users with a single and integrated coherent network.

One of the ways hashing can be implemented in a distributed system is by taking hash Modulo of a number of nodes.

The hash function can be defined as **node_number = hash(key)mod_N** where N is the number of Nodes.

To add/retrieve a key to/from the node, the client computes the hash value of that key and uses the result to contact the appropriate node by looking up its IP address. If the key is found, it is retrieved else it is added to the pool of the Node.

**For example,** a client wants to retrieve 1. Its hash value = 7739. It will go to Node number (7739%3). So, it will contact node 2. If the key is not found, it is added to the pool.

**Disadvantage: The Rehashing problem.**

Let’s suppose the number of nodes has changed. As the keys are distributed depending on the number of nodes, since the number of nodes has changed, the value from hashing function will change so the keys will be redistributed.

For example, let’s remove C

The allocated nodes for each key has changed.

For example, let’s add a new node D

The allocated nodes for each key has changed.

**So, the distribution changes whenever we change the number of Nodes. **

What happens is the more number of redistributions, the more number of misses, the more number of memory fetches, placing an additional load on the node and thus decreasing the performance.

**Consistent Hashing. **

The above issue can be solved by **Collision Hashing**.

This method operates **independently **of the number of nodes as the hash function is not dependent on the number of nodes. Here we assume a chain/ring is formed and we place the keys as well as the nodes on the ring and distribute them.

The hash value can be computed as **position_on_chain = hash(key)mod_360**

(360 is chosen as we are representing things in a circle. And a circle has 360 degrees.)

Steps for the arrangement –

1) Find Hash values of the keys and place it on the ring according to the hash value.

2) Find Hash values of the individual nodes and place it on the ring according to the hash value.

3) Now map each key with the node which is closest to it in the counter-clockwise direction.

4) If the position of a node and key is same, assign that key to the node.

So, now to locate a node, we will just traverse the ring from the point of the position of the key till we find a node in the counter-clockwise direction.

Operations –

Case 1) Adding a node –

Suppose we add a new node D in the ring by calculating the hash. Only those keys will be redistributed whose values lie between the D and C. Now they will not point towards A, they will point towards D.

Case 2) Removing a node –

Suppose we remove a node C in the ring. Only those keys will be redistributed whose values lies between C and the B. Now they will not point towards C, they will point towards A.

This is how consistent hashing solves the rehashing problem. The **number of keys which needs to be redistributed after rehashing is minimized.** So, less number of memory fetches. Hence, performance is optimised.

## Recommended Posts:

- Deadlock detection in Distributed systems
- Interprocess Communication in Distributed Systems
- Comparison - Centralized, Decentralized and Distributed Systems
- Introduction to Signals and Systems: Properties of systems
- Advantages of Distributed database
- Difference between Network OS and Distributed OS
- MPI - Distributed Computing made easy
- Design Issues of Distributed System
- Mutual exclusion in distributed system
- Algorithm for implementing Distributed Shared Memory
- Hierarchical Deadlock Detection in Distributed System
- Centralized vs Distributed Version Control: Which One Should We Choose?
- Maekawa’s Algorithm for Mutual Exclusion in Distributed System
- Lamport's Algorithm for Mutual Exclusion in Distributed System
- Suzuki–Kasami Algorithm for Mutual Exclusion in Distributed System

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.