A 5-stage pipelined processor has Instruction Fetch(IF),Instruction Decode(ID),Operand Fetch(OF),Perform Operation(PO)and Write Operand(WO)stages.The IF,ID,OF and WO stages take 1 clock cycle each for any instruction.The PO stage takes 1 clock cycle for ADD and SUB instructions,3 clock cycles for MUL instruction,and 6 clock cycles for DIV instruction respectively.Operand forwarding is used in the pipeline.What is the number of clock cycles needed to execute the following sequence of instructions?
Instruction Meaning of instruction
I0 :MUL R2 ,R0 ,R1 R2 ¬ R0 *R1
I1 :DIV R5 ,R3 ,R4 R5 ¬ R3/R4
I2 :ADD R2 ,R5 ,R2 R2 ¬ R5+R2
I3 :SUB R5 ,R2 ,R6 R5 ¬ R2-R6
Operand Forwarding : In this technique the value of operand is given to the concerned stage of dependent instruction before it is stored.
In the above question, I2 is dependent on I0 and I1, and I3 is dependent on I2.
Let's see this question with a time-space diagram.
The above is a space-time diagram representing the pipeline in which the instructions gets executed.
Instruction 0 is a MUL operation which take 3 clock cycles of CPU in the PO stage, and at any other stage it takes only 1 cycle.
Instruction 1 is a DIV operation which take 6 clock cycles of CPU in the PO stage, and at any other stage it takes only 1 cycle.
It can be noticed here that even when the OF stage was free in the 4th clock cycle, then also the instruction 1 was not given to it. This is a design issue. The operands should be fetched only if they are going to get operated or executed in the next cycle, else there is a possibility of data corruption. As PO stage was not free in the next cycle hence OF was delayed and was done for instruction 1 only just before 1 cycle of going to PO stage.
Instruction 2 is an ADD operation which take 1 clock cycles of CPU in all stages. But it is a dependent operation. it needs the operands which are provided by Instruction 0 and 1.
Instruction 2 needs R5 and R2 to add, it gets R2 on time, because till the time Instruction 2 reaches its PO stage R2 would have been stored in memory. Now R5 is also needed, but Instruction 2's PO and Instruction 1's WO are in parallel. That means Instruction 2 can't take the value of R5 before it is stored by Instruction 1. So here comes the concept of Operand Forwarding. Before Instruction 1 store it's result/value which is R5, it can first forward it to instruction 2's Fetch-Execute Buffer, so that Instruction 2 can also use it in parallel to Instruction's WO stage. This will save extra clock cycles required( if Operand forwarding is not used, and R5 need to be taken from memory).
In Instruction 3, same operand forwarding concept is applied for the value of R2 which is computed by Instruction 2.
Hence operand forwarding saved 2 extra clock cycles here. ( 1 cycle in Instruction 2 and 1 cycle in Instruction 3).
So the total no of cycles are 15, which can be seen from the diagram, each instance of the stage represents 1 clock cycle. So total 15.