Various properties of CAP Theorem

Last Updated : 23 May, 2023

In the distributed system you must have heard of the term CAP Theorem. CAP theorem states that it is impossible to achieve all of the three properties in your Data-Stores. Here ALL three properties refer to C = Consistency, A = Availability, and P = Partition Tolerance. According to this theorem, it is only possible to achieve either of two at a time.

If there are 1000 requests/month they can be managed but 1 million requests/month will be a little difficult. Here in the diagram we can have n different database setups. All the write operations are performed in the Master database and all the read operations on the Slave database. But data from the master has to be replicated to the slave databases which happens asynchronously.

1. Inconsistency: when some user tries to read data right after writing (even before it has been replicated to the slave) is called as inconsistency. The user might think of this as a bug or something.

2. Sharding: Therefore, to overcome this disadvantage of inconsistency we have another method known as Sharding. In this, instead of a Master-Slave relation, all databases here are Masters i.e., all databases share equal responsibilities. For instance, in the following illustration, we have 3 instances of databases. Points to be noted here: Data is divided into n separate segments (here, 3). The system scales the read and writes operations by n times (if there are n databases).

Disadvantages of this method: If one particular instance will have a heavier load let’s say DB-1 then it becomes difficult to scale. Now, how to scale? We’ll have to divide the load instance into say two instances to share the load. That particular database would be required to be taken down, then divide again, and then switch back. This is a tedious process and always needs to be monitored. SQL joins would be required b/w shards. Let’s learn about each of the following properties by considering the following system where we have two instances of Data both of which are master databases.

3. Consistency – As seen before, if data is updated to one database instance but before it is replicated to another instance user query, if the information user gets then, is its previous data then it means your system is inconsistent. If a user gets the same updated value then the system is said to be consistent.

4. Availability – Even if one or more of your machines goes down your system should still be always available, which means that if one more database servers fail, as a whole your system should be able to perform read and write operations. Thus there must be no Down Time.