Prerequisite – The CAP Theorem
In the distributed system you must have heard of the term CAP Theorem. CAP theorem states that it is impossible to achieve all of the three properties in your Data-Stores.
Here ALL three properties refer to C = Consistency, A = Availability and P = Partition Tolerance. According to this theorem it is only possible to achieve either of two at a time.
If there are 1000 requests/month they can be managed but 1 million requests/month will be a little difficult. Here in the diagram we can have n different database setups. All the write operations are performed in the Master database and all the read operations on the Slave database. But data from master has to be replicated to the slave databases which happens asynchronously.
Qhen some user tries to read data right after writing (even before it has been replicated to the slave) is called as inconsistency. The user might think of this as a bug or something.
Therefore, to overcome this disadvantage of inconsistency we have another method known as Sharding.
In this, instead of a Master-Slave relation, all databases here are Masters i.e., all databases share equal responsibilities. For instance, in the following illustration we have 3 instances of databases. Points to be noted here:
Data is divided into n separate segments (here, 3).
The system scales the read and write operations by n times (if there are n databases).
Disadvantages of this method :
If one particular instance will have a heavier load let’s say DB-1 then it becomes difficult to scale.
Now, how to scale? We’ll have to divide the load instance into say two instances to share the load. That particular database would be required to be taken down, then divide again and then switch it back. This is a tedious process, always needs to be monitored.
SQL joins would be required b/w shards.
Let’s learn about each of the following properties by considering the following system where we have two instances of Data both of which are master databases.
As discussed before, if a data is updated to one database instance but before it is replicated to its another instance user query, if the information user gets then, is its previous data then it means your system is inconsistent. If user gets the same updated value then the system is said to be consistent.
Even if one or more of your machine goes down your system should still be always available, means that if one more database-servers fail, but as a whole your system should be able to perform read and write operations. Thus there must be no Down Time.
Partition Tolerance –
Even if the connection between your database servers is lost, your system should still be working.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.