Open In App

Recovery from failures in Two Phase Commit Protocol (Distributed Transaction)

Prerequisite: Two-Phase Commit Protocol
In the 2-phase commit protocol, the sites contributing to a distributed transaction and the coordinator that is managing the whole transaction globally may fail or crash, and this could lead to the whole transaction failure. Since unanimity is required in order to commit a distributed transaction successfully if any one of the sites fails, so the whole transaction will get aborted.

Following kinds of failures could be encountered in the 2-phase commit protocol:
Failure of contributing siteIf the coordinator(C) detects that a site has crashed, so coordinator takes the following actions:



If the site(Si) has failed before responding with a <ready T> message to Coordinator(C), the coordinator assumes that the site has responded with an <abort T> message.
If the site has failed after sending <ready T> message to C, so the coordinator will ignore the site failure and will execute the rest of the commit protocol in the usual manner.
When the failed site (Si) recovers from failure, so the site will examine its log record in order to know the destiny of the transaction T. whether it has failed or not-

If the log contains no record (abort, commit, ready) about transaction T, thus we know that Si has failed before responding to <prepare T> message from Ci. Hence, Ci must abort & execute <undo T>.



Failure of Coordinator(Ci)- If the coordinator fails in the midst of the execution of the transaction T in 2-phase commit protocol, then participating sites must decide the destiny of transaction T. In certain cases participating sites can’t decide whether to commit or abort the transaction T and therefore these sites must wait for the recovery of the failed coordinator.

                                  The solution to the blocking Problem is Three Phase commit Protocol.

Network Partitioning- is nothing but a kind of failure where the network connectivity is split between the partitions or nodes due to a failure. When the network partitioning occurs following two cases may occur:

Reference: Henry_korth

Article Tags :