What is Replication in Distributed System?

Last Updated : 17 Dec, 2021

In a distributed system data is stored is over different computers in a network. Therefore, we need to make sure that data is readily available for the users. Availability of the data is an important factor often accomplished by data replication. Replication is the practice of keeping several copies of data in different places.

Why do we require replication?

The first and foremost thing is that it makes our system more stable because of node replication. It is good to have replicas of a node in a network due to following reasons:

If a node stops working, the distributed network will still work fine due to its replicas which will be there. Thus it increases the fault tolerance of the system.
It also helps in load sharing where loads on a server are shared among different replicas.
It enhances the availability of the data. If the replicas are created and data is stored near to the consumers, it would be easier and faster to fetch data.

Types of Replication

Active Replication
Passive Replication

Active Replication:

The request of the client goes to all the replicas.
It is to be made sure that every replica receives the client request in the same order else the system will get inconsistent.
There is no need for coordination because each copy processes the same request in the same sequence.
All replicas respond to the client’s request.

Advantages:

It is really simple. The codes in active replication are the same throughout.
It is transparent.
Even if a node fails, it will be easily handled by replicas of that node.

Disadvantages:

It increases resource consumption. The greater the number of replicas, the greater the memory needed.
It increases the time complexity. If some change is done on one replica it should also be done in all others.

Passive Replication:

The client request goes to the primary replica, also called the main replica.
There are more replicas that act as backup for the primary replica.
Primary replica informs all other backup replicas about any modification done.
The response is returned to the client by a primary replica.
Periodically primary replica sends some signal to backup replicas to let them know that it is working perfectly fine.
In case of failure of a primary replica, a backup replica becomes the primary replica.

Advantages:

The resource consumption is less as backup servers only come into play when the primary server fails.
The time complexity of this is also less as there’s no need for updating in all the nodes replicas, unlike active replication.

Disadvantages:

If some failure occurs, the response time is delayed.

Suggest improvement

File Caching in Distributed File Systems

Atomic Commit Protocol in Distributed System

Share your thoughts in the comments

Introduction to Distributed System

Communication in Distributed Systems

Remote Procedure Calls in Distributed System

Synchronization in Distributed System

Source Management and Process Management

Distributed File System and Distributed shared memory

Distributed Scheduling and Deadlock

Security in Distributed System

Distributed Multimedia and Database System

Distributed Algorithm

Distributed Transactions