Exception Handling in Distributed System

Last Updated : 04 May, 2022

In this article, we will see the concept of Exception Handling in Distributed Systems in detail. The exception is defined as an abnormal condition that occurs during the execution of a program. This condition arises when a failure occurs due to some error in a program. In other words, Exception handling is defined as a broad framework for dealing with unusual system situations or errors that are produced by either hardware or software flaws.

Key Challenges in Exception Handling in Distributed System:

One key challenge is that addressing an exception may necessitate involving numerous concurrent components at the same time, all of which are attempting to work together to solve a global problem. Another problem is that in a distributed environment, many exceptions may be triggered at the same time in separate nodes.

Exception Handling Mechanism:

An exception handling mechanism is a control structure in programming languages that allows programmers to report the substitution of normal program execution with an exceptional execution when an exception (i.e. inconsistency with the program specification and thus an interruption to the normal flow of control) is identified. Exception contexts are places in which the same exceptions are addressed the same way for any given exception mechanism; these contexts are frequently blocks or procedure bodies. Each context should have an exception handler, one of which will be called when the corresponding exception is raised. The exception will be propagated if a handler for this exception does not exist in the context or is unable to recover the program.

For this reason, exception handling mechanisms have been incorporated into some programming languages like Java.

In distributed systems, the following are the places where an exception generally occurs:

When an object is requested to accomplish something it can’t, it’s being asked in the wrong way.
Faulty object
The object causes an incorrect operation to be performed on another object.
Communication failure

Error handling is difficult to implement in distributed systems because different components are involved in carrying out a single task. If we look at the above cases of exception then come to the following conclusion:

Pre-conditions on operations can handle Case 1 by avoiding this item from being corrupted by problems elsewhere.
Case 2 requires a higher-level exception to be raised (and will require maintenance).
Case 3 necessitates a retry mechanism that allows us to select a different option if one is available.
Case 4 can be solved by switching to a different option.

Concept of Conversations with Coordinated Atomic (CA) Actions, and Concurrent Exception Handling:

Conversations were first presented and were meant to offer collaborative backward error recovery of concurrent processes that were designed to coordinate by exchanging information. On entering, any process participating in such a conversation must store its state. A process can only communicate with other processes in the same conversation while it is inside a conversation. If any process fails its acceptance test, all participating processes will revert to the saved state and may implement a different algorithm. Processes can enter a conversation asynchronously, but they must all exist at the same time if the acceptance test is met.

As a result, concurrent systems handle exceptions by extending conversations and employing the well-known atomic action model. Each action is accompanied by a set of exceptions. Each process that takes part inactivity has a set of handlers for predefined exceptions. When any of these exceptions occur in a process, relevant handlers are activated in all active participants.

Exception resolution is crucial because numerous independent exceptions can be raised at the same time, or several faults discovered that could be indications of a different more catastrophic defect. In practice, exception trees are thought to be more appropriate for addressing concurrent exceptions than exception priority. A partial order is preserved in the exception tree, with higher-level exceptions having a handler that will address lower-level exceptions.

Because the nested action’s execution is in theory invisible and indivisible for the containing action, it’s normal to wait until it is finished. Implementing an abortion handler in each process involved in a nested action and raising an abortion except in all of the nested action’s participants is an alternative option. Following the execution of a resolution algorithm, either all-action participants’ handlers for the same exception are started, or a failure exception is produced if no related handlers are discovered. The second strategy appears to be more useful. For starters, it’s possible that a process that detects an error is supposed to enter the nested action but never does.

Exception Model for Coordinated Atomic (CA) Actions:

The exceptions that can be raised within a CA action are declared together with the action. Handlers should be linked to CA action participants such that when a participant enters the action, it enters the corresponding exception context. A subset of these participating objects can then participate in a nested CA action, which has all of the attributes of a nested transaction in terms of atomic items. It is important to note that the nesting of CA operations results in the nesting of exception contexts. As a result, it must be ensured that each participating object of the nested action is associated with the right set of handlers. In practice, this might be done statically or dynamically. Once the association is established, explicit exception propagation semantics may be simply implemented. Exceptions can be propagated through nested exception contexts, which correspond to a series of nested CA actions.

The association of handlers with the exception context depends upon the particular way objects enter a CA action and the properties of object-oriented systems. If the mechanism of operation call is used by an object to enter a CA action then operation level exceptions would be a better choice instead of object/class level exceptions which can be used when exception context cannot be changed dynamically. Hence, the terminating criteria in case of exceptional situations are that the handlers take over the duties of participating objects of a CA action and complete with either the successful execution or with failure signal.

Because a CA action can handle two types of concurrency, external atomic objects must be dealt with explicitly when forward error recovery is desired. We do not impose stringent requirements on the use of atomic objects during forwarding recovery, but we do demand that these atomic objects be left in a consistent state following recovery. It is especially crucial to note that an exception within the CA operation does not always result in the restoration of all atomic items to their previous states.

Suggest improvement

Handling Failure in Distributed System

Share your thoughts in the comments

Exception Handling in Distributed System

Key Challenges in Exception Handling in Distributed System:

Exception Handling Mechanism:

Concept of Conversations with Coordinated Atomic (CA) Actions, and Concurrent Exception Handling:

Exception Model for Coordinated Atomic (CA) Actions:

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?