Introduction to Multi-threaded Architectures and Systems in OS
According to me, multithreaded architecture is actually a trend and reliable and easy to applicable solution for upcoming microprocessor design, so I studied four research papers on this topic just to familiarize myself with the technology involved in this subject.
In today’s world, there is a rapid progress of Very Large Scale Integrated circuit(VLSI) technology that allows processors’ designers to assimilate more functions on a single chip, in future microprocessors will soon be able to issue more than 12 instructions per machine cycle.
Existing superscalar and Very Large Instruction Word (VLIW) microprocessors utilize multiple functional units in order to exploit instruction-level parallelism from a single thread of control flow. These kinds of microprocessors are depending on the compiler or the hardware of the system in order to excerpt instruction-level parallelism from programs and to then afterward schedule independent instructions on multiple functional units on each machine cycle. And when the issue rate of future microprocessors increases, however, the compiler or the hardware will have to extract some more instruction-level parallelism from programs by analyzing a larger instruction buffer (for example -Reservation Station Entry).
It is very tough to extract enough parallelism with a single thread of control even for a small number of functional units.
Limitation of Single-threaded sequencing mechanism are as follow:
- In order to exploit instruction-level parallelism, independent instructions from different-different basic building-blocks in a single instruction stream needed to be examined and issued together.
- As the issue rate increases, a larger instruction buffer (for example -Reservation Station Entry) is needed to contain several basic building-blocks, which will be control dependency on different branch conditions, but their examination must be hit together.
- And Instruction-level analytical execution with branch prediction is also needed to move independent instructions across basic block boundaries. This problem is especially serious when a compiler attempts to pipeline a loop with many conditional branches.
- Here, the accuracy of systems is very low.
- Here, the speed of the system is quite low as compare to multithreaded architectures.
Features of Multithreaded Architecture are as follow:
- In its regular form, a multithreaded processor is made up of many numbers of thread processing elements that are connected to each other with a unidirectional ring which helps in performing the system in a better way.
- All the multiple thread processing elements have their own private level-one instruction cache, but they’ll share the level-one data cache and the level-two cache.
- In the multithreaded systems also some shared register files present that used for maintaining some global registers and a lock register.
- During run-time, the multiple thread processing elements, each has its own program counter and instruction execution path, which can then easily fetch and execute instructions from multiple program locations simultaneously.
- Each of the present thread processing element always has a private memory buffer to cache speculative stores and which is also used to support run-time data dependency checking.
- Fortunately, the compiler statically divides the control flow graph of a program into threads that coincide with a portion of the control flow graph. A thread performs a round of computation on a set of data that has no, or only a few, dependencies with other concurrent threads. The compiler will later determine the granularity (it’ll directly affect the performance) of the threads, which are hardly one or several iterations of a loop.
- The real execution of a program initiates from its entry thread. This entry thread can then transfers the systems’ processing to a successor thread on some another thread processing element. The successor thread can further transfer the systems’ processing to its own successor thread. This process will continue until all thread processing elements are busy with some appropriate tasks.
- When the multiple threads are executed on the multithreaded processor, the primeval thread in the sequential order is referred to as the head thread. And then all of the other threads which are derived from it are called successor threads. Once the head thread completes its computation, then further it will retire and release the thread processing element and then its (just) successor thread then becomes the new head thread. This completion and retirement of the threads must have to follow the original sequential execution order.
- In the multithreaded system (or model), a thread can transfer the systems’ processing one of its successor threads with or without control speculation. When the transfer of the systems’ processing from head to its successor thread without control speculation, the thread performing the fork operation must ensure that all of the control dependencies of the newly generated successor thread have been satisfied.
- If the transfer of the systems’ processing from the head to its successor thread with control speculation, however, it must be later on verifying all of the speculated control dependencies. If any of the speculated control dependencies are not true, the thread must have to issue a command to kill the successor thread and all of its subsequent threads.
- In general, the multithreaded architecture will use a thread pipelining execution model in order to enforce data dependencies between concurrent threads. Unlike the instruction pipelining mechanism in a superscalar processor, where instruction sequencing, data dependencies checking, and forwarding are performed by processor hardware automatically, the multithreaded architecture performs thread initiation and data forwarding through explicit thread management and communication instructions.