Scheduling and Load Balancing in Distributed System

Last Updated : 30 Apr, 2022

In this article, we will go through the concept of scheduling and load balancing in distributed systems in detail.

Scheduling in Distributed Systems:

The techniques that are used for scheduling the processes in distributed systems are as follows:

Task Assignment Approach: In the Task Assignment Approach, the user-submitted process is composed of multiple related tasks which are scheduled to appropriate nodes in a system to improve the performance of a system as a whole.
Load Balancing Approach: In the Load Balancing Approach, as the name implies, the workload is balanced among the nodes of the system.
Load Sharing Approach: In the Load Sharing Approach, it is assured that no node would be idle while processes are waiting for their processing.

Note: The Task Assignment Approach finds less applicability
practically as it assumes that characteristics of processes
like inter-process communication cost, etc. must be known in advance.

Characteristics of a Good Scheduling Algorithm:

The following are the required characteristics of a Good Scheduling Algorithm:

The scheduling algorithms that require prior knowledge about the properties and resource requirements of a process submitted by a user put a burden on the user. Hence, a good scheduling algorithm does not require prior specification regarding the user-submitted process.
A good scheduling algorithm must exhibit the dynamic scheduling of processes as the initial allocation of the process to a system might need to be changed with time to balance the load of the system.
The algorithm must be flexible enough to process migration decisions when there is a change in the system load.
The algorithm must possess stability so that processors can be utilized optimally. It is possible only when thrashing overhead gets minimized and there should no wastage of time in process migration.
An algorithm with quick decision making is preferable such as heuristic methods that take less time due to less computational work give near-optimal results in comparison to an exhaustive search that provides an optimal solution but takes more time.
A good scheduling algorithm gives balanced system performance by maintaining minimum global state information as global state information (CPU load) is directly proportional to overhead. So, with the increase in global state information overhead also increases.
The algorithm should not be affected by the failure of one or more nodes of the system. Furthermore, even if the link fails and nodes of a group get separated into two or more groups then also it should not break down. So, the algorithm must possess decentralized decision-making capability in which consideration is given only to the available nodes for taking a decision and thus, providing fault tolerance.
A good scheduling algorithm has the property of being scalable. It is flexible for scaling when the number of nodes increases in a system. If an algorithm opts for a strategy in which it inquires about the workload of all nodes and then selects the one with the least load then it is not considered a good approach because it leads to poor scalability as it will not work well for a system having many nodes. The reason is that the inquirer receives a lot many replies almost simultaneously and the processing time spent for reply messages is too long for a node selection with the increase in several nodes (N). A straightforward way is to examine only m of N nodes.
A good scheduling algorithm must be having fairness of service because in an attempt to balance the workload on all nodes of the system there might be a possibility that nodes with more load get more benefit as compared to nodes with less load because they suffer from poor response time than stand-alone systems. Hence, the solution lies in the concept of load sharing in which a node can share some of its resources until the user is not affected.

Load Balancing in Distributed Systems:

The Load Balancing approach refers to the division of load among the processing elements of a distributed system. The excess load of one processing element is distributed to other processing elements that have less load according to the defined limits. In other words, the load is maintained at each processing element in such a manner that neither it gets overloaded nor idle during the execution of a program to maximize the system throughput which is the ultimate goal of distributed systems. This approach makes all processing elements equally busy thus speeding up the entire task leads to the completion of the task by all processors approximately at the same time.

Types of Load Balancing Algorithms:

Static Load Balancing Algorithm: In the Static Load Balancing Algorithm, while distributing load the current state of the system is not taken into account. These algorithms are simpler in comparison to dynamic load balancing algorithms. Types of Static Load Balancing Algorithms are as follows:
- Deterministic: In Deterministic Algorithms, the properties of nodes and processes are taken into account for the allocation of processes to nodes. Because of the deterministic characteristic of the algorithm, it is difficult to optimize to give better results and also costs more to implement.
- Probabilistic: n Probabilistic Algorithms, Statistical attributes of the system are taken into account such as several nodes, topology, etc. to make process placement rules. It does not give better performance.
Dynamic Load Balancing Algorithm: Dynamic Load Balancing Algorithm takes into account the current load of each node or computing unit in the system, allowing for faster processing by dynamically redistributing workloads away from overloaded nodes and toward underloaded nodes. Dynamic algorithms are significantly more difficult to design, but they can give superior results, especially when execution durations for distinct jobs vary greatly. Furthermore, because dedicated nodes for task distribution are not required, a dynamic load balancing architecture is frequently more modular. Types of Dynamic Load Balancing Algorithms are as follows:
- Centralized: In Centralized Load Balancing Algorithms, the task of handling requests for process scheduling is carried out by a centralized server node. The benefit of this approach is efficiency as all the information is held at a single node but it suffers from the reliability problem because of the lower fault tolerance. Moreover, there is another problem with the increasing number of requests.
- Distributed: In Distributed Load Balancing Algorithms, the decision task of assigning processes is distributed physically to the individual nodes of the system. Unlike Centralized Load Balancing Algorithms, there is no need to hold state information. Hence, speed is fast.

Types of Distributed Load Balancing Algorithms:

Cooperative In Cooperative Load Balancing Algorithms, as the name implies, scheduling decisions are taken with the cooperation of entities in the system. The benefit lies in the stability of this approach. The drawback is the complexity involved which leads to more overhead than Non-cooperative algorithms.
Non-cooperative: In Non-cooperative Load Balancing Algorithms, scheduling decisions are taken by the individual entities of the system as they act as autonomous entities. The benefit is that minor overheads are involved due to the basic nature of non-cooperation. The drawback is that these algorithms might be less stable than Cooperative algorithms.

Issues in Designing Load-balancing Algorithms:

Many issues need to be taken into account while designing Load-balancing Algorithms:

Load Estimation Policies: Determination of a load of a node in a distributed system.
Process Transfer Policies: Decides for the execution of process: local or remote.
State Information Exchange: Determination of strategy for exchanging system load information among the nodes in a distributed system.
Location Policy: Determining the selection of destination nodes for the migration of the process.
Priority Assignment: Determines whether the priority is given to a local or a remote process on a node for execution.
Migration limit policy: Determines the limit value for the migration of processes.

Suggest improvement

Distributed System - Thrashing in Distributed Shared Memory

Issues Related to Load Balancing in Distributed System

Share your thoughts in the comments

Introduction to Distributed System

Communication in Distributed Systems

Remote Procedure Calls in Distributed System

Synchronization in Distributed System

Source Management and Process Management

Distributed File System and Distributed shared memory

Distributed Scheduling and Deadlock

Security in Distributed System

Distributed Multimedia and Database System

Distributed Algorithm

Distributed Transactions