MongoDB – Replication and Sharding

Last Updated : 16 Apr, 2024

Replication and Sharding are two important features for scalability and data availability in MongoDB. Replication enhances data availability by creating duplicate copies of the dataset, whereas sharding helps in horizontal scaling by partitioning the large collection (dataset) into smaller discrete parts called shards.

In this article, we will learn about Sharding and Replication in MongoDB. We will cover all important concepts related to them and look at their functioning with diagrams.

Replication in MongoDB

Replication is the method of duplication of data across multiple servers in MongoDB.

For example, we have an application that reads and writes data to a database and says server A has a name and balance which will be copied/replicated to two other servers in two different locations.

replication in mongodb diagram

Replication increases redundancy and data availability with multiple copies of data on different database servers. So, it will increase the performance of reading scaling.

The set of servers that maintain the same copy of data is known as replica servers or MongoDB instances.

Key Features of Replication:

Replica sets are the clusters of N different nodes that maintain the same copy of the data set.
The primary server receives all write operations and record all the changes to the data i.e, oplog.
The secondary members then copy and apply these changes in an asynchronous process.
All the secondary nodes are connected with the primary nodes. there is one heartbeat signal from the primary nodes. If the primary server goes down an eligible secondary will hold the new primary.

Advantages of Replication

High Availability of data disasters recovery
No downtime for maintenance ( like backups index rebuilds and compaction)
Read Scaling (Extra copies to read from)

How to Perform Replication in MongoDB

In order to perform replication in MongoDB, we need to first create replica sets and give permission to script the file. The basics syntax of –replSet is −

mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"

create a ".sh"  file create_replicaset.sh and init_mongoreplica.js

Examples:

creating replica set in mongodb

Then run the following script :

./create_replicaset.sh

Directories will be created and then run the mongo.
In the Mongo terminal, use the command rs.initiate() to initiate a new replica set.

performing replication in mongodb

Sharding in MySQL

Sharding is a method for distributing large collection(dataset) and allocating it across multiple servers. MongoDB uses sharding to help deployment with very big data sets and high volume operations.

Sharding combines more devices to carry data extension and the needs of read and write operations.

Need for Sharding

Database systems that have big data sets or high throughput requests can not be handled by a single server.

For example, High query flows can drain the CPU limit of the server and large data set stress the I/O capacity of the disk drive.

How does Sharding work?

Sharding determines the problem with horizontal scaling. It breaks the system dataset and store it over multiple servers, adding new servers to increase the volume as needed.

replication and sharding diagram

Now, instead of one signal as primary, we have multiple servers called Shard. We have different routing servers that will route data to the shard servers.

For example: Let say we have Data 1, Data 2, and Data 3 this will be going to the routing server which will route the data (i.e, Different Data will go to a particular Shard ). Each Shard holds some pieces of data.

Here the configuration server will hold the metadata and it will configure the routing server to integrate the particular data to a shard however configure server is the MongoDB instance if it goes down then the entire server will go down, So it again has Replica Configure database.

Advantages of Sharding

Sharding adds more server to a data field automatically adjust data loads across various servers.
The number of operations each shard manage got reduced.
It also increases the write capacity by splitting the write load over multiple instances.
It gives high availability due to the deployment of replica servers for shard and config.
Total capacity will get increased by adding multiple shards.

In order to create sharded clusters in MongoDB, We need to configure the shard, a config server, and a query router.

Conclusion

Both Replication and sharding in MongoDB helps in scaling of database. Where replication helps in data availability, sharding is useful to horizontally scale large datasets.

In this article, we have learnt about their uses, advantages and implementation. Using both of these techniques, users can ensure efficient and optimal database performance.

Suggest improvement

Create user and add role in MongoDB

MongoDB - Backup and Restoration

Share your thoughts in the comments

Introduction

Installation

Basics of MongoDB

MongoDB Methods

Comparison Operators

Logical Operators

Arithmetic Operators

Field Update Operators

Array Expression Operators

Array Update Operators

String Expression Operators

Working with Documents and Collections

Indexing in MongoDB

MongoDB Advance

MongoDB Applications and Projects