Prerequisite – Multi-threaded Architectures
The implementation of threads in the multithreaded model is divided into various stages, each of which performs a unique function.
The various execution stages of every thread and the relationship between every thread are shown as follows:
1. Continuation Stage:
- (i) Once a thread is initiated by its predecessor (or previous) thread, it starts executing its continuation stage. The important function of this stage is to calculate variables that are recurrence in nature.
For Example – loop index variables needed to move the next thread. The values of these variables will be ahead to the next thread processing element just before the next thread is activated.
- (ii) In the case of a DO loop, the index variables, such as x=x+1 or y=y->next, will be calculated and then forwarded to the next thread elements (processing element). The continuation stage of a thread ends with instruction of diverging (divide), which is the real cause for the next thread to initiate.
2. Target-Store-Address-Generation Stage:
- (i) These threads can perform store operations that are later on concurrent threads and can be data-dependent. This stage will store operations and are referred to as target stores (TS).
- (ii) Second most important work for this Stage is to reduce hardware complexity, most of the implementations of the multithreaded model doesn’t allow hypothesizing on data dependencies. To make run-time data dependence checking easier, the addresses of these target stores needs to be calculated as soon as possible (ASAP).
The TSAG (target-store-address-generation) stage performs the address computation for these target stores. Further, these addresses are going stored in the memory buffer of each and every thread processing element and then are forwarded to the memory buffers of all the succeeding concurrent threads.
- (iii) Once a thread completes the TSAG stage then all of the target store addresses have been forwarded, and then it sends the tsag-done flag to the successor thread. Then this flag will inform the next thread which can start the computation which is dependent on the previous threads. But before receiving the tsag_done flag, then a thread can only perform the computation that does not depend on any of the target stores of its active predecessor threads. But to increase the overlap between threads, the target-store-addresses-generation stage can be further divided into two parts.
The first part is for the target store to addresses generations that do not have any data dependencies on earlier threads, which are computed quickly and then forwarded to the next thread. The second part computes unsafe target store addresses that may be data-dependent on an earlier thread. These computations must wait for the tsag_done flag from the predecessor thread before beginning.
3. Computation Stage:
- (i) This stage performs the main computation of a thread known as the computation stage. If the addresses of the load operation then match that of the target store entry in its memory buffer during this stage, the thread can either read the data from the entry if it is available or further it will wait until the data is forwarded from an earlier concurrent thread.
While on the other hand, if the value of the target store is calculated during the implementation of this stage, then the thread needs to forward the address and the data to the memory buffers for all of its concurrent successor threads.
- (ii) The computation stage of a thread ends with a stop instruction.
4. Write-Back Stage:
- (i) If the control dependencies are completely cleared after the stage of computation when the thread concluded (or completes) its execution by writing all the data from the operation that are stored in its memory buffer to memory, which actually includes data from both the targeted and the regular stores.
- (ii) When the data from the store operations need to remain in the memory buffer till this write-back stage to secure the memory state from being changed by a hypothetical thread that is afterward terminate by an earlier simultaneous thread due to an erroneous control hypothesis.
- (iii) For the maintenance of the correct memory state, simultaneous threads must have to perform their write-back stages in their indigenous order. That means a thread must wait for a wb_done flag from its previous thread before it can perform its write-back stage. It also needs to forward a wb_done flag to the next thread after it completes its own write-back stage. Because all of the stored data are committed thread-by-thread, write-after-read and write-after-write menace cannot occur during run-time.