Pipelining and Addressing modes

Question 1

Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I1, I2, I3, …, I12 is executed in this pipelined processor. Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

Cross

132

Tick

165

Cross

176

Cross

328



Question 1-Explanation: 
Pipeline will have to be stalled till EI stage of I4 completes, 
as EI stage will tell whether to take branch or not. 

After that I4(WO) and I9(FI) can go in parallel and later the
following instructions.
So, till I4(EI) completes : 7 cycles * (10 + 1 ) ns = 77ns
From I4(WO) or I9(FI) to I12(WO) : 8 cycles * (10 + 1)ns = 88ns
Total = 77 + 88 = 165 ns
Question 2

Register renaming is done in pipelined processors

Cross

as an alternative to register allocation at compile time

Cross

for efficient access to function parameters and local variables

Tick

to handle certain kinds of hazards

Cross

as part of address translation



Question 2-Explanation: 

Register renaming is done to eliminate WAR (Write after Read) and WAW (Write after Write) dependency between instructions which could have caused pipieline stalls. Hence, (C) is the answer.

Example:

I1: Read A to B
I2: Write C to A

Here, there is a WAR dependency and pipeline would need stalls. In order to avoid it register renaming is done and 

Write  C to A
will be 
Write  C to A

WAR dependency is actually called anti-dependency and there is no real dependency except the fact that both uses same memory location. Register renaming can avoid this. Similarly WAW also. 
 

Question 3

Consider a hypothetical processor with an instruction of type LW R1, 20(R2), which during execution reads a 32-bit word from memory and stores it in a 32-bit register R1. The effective address of the memory location is obtained by the addition of a constant 20 and the contents of register R2. Which of the following best reflects the addressing mode implemented by this instruction for operand in memory?

Cross

Immediate Addressing

Cross

Register Addressing

Cross

Register Indirect Scaled Addressing

Tick

Base Indexed Addressing



Question 4
Consider evaluating the following expression tree on a machine with load-store architecture in which memory can be accessed only through load and store instructions. The variables a, b, c, d and e initially stored in memory. The binary operators used in this expression tree can be evaluate by the machine only when the operands are in registers. The instructions produce results only in a register. If no intermediate results can be stored in memory, what is the minimum number of registers needed to evaluate this expression?   gate2011Q26
Cross
2
Cross
9
Cross
5
Tick
3


Question 4-Explanation: 

R1←c,  R2←d,  R2←R1+R2,  R1←e,  R2←R1-R2
Now to calculate the rest of the expression we must load a and b into the registers but we need the
content of R2 later.
So we must use another Register.
R1←a, R3←b, R1←R1-R3, R1←R1+R2

Source: http://clweb.csa.iisc.ernet.in/rahulsharma/gate2011key.html

Question 5
Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure: GATECS2011Q41 What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?
Cross
4.0
Tick
2.5
Cross
1.1
Cross
3.0


Question 5-Explanation: 
Pipeline registers overhead is not counted in normal 
time execution

So the total count will be

5+6+11+8= 30 [without pipeline]

Now, for pipeline, each stage will be of 11 n-sec (+ 1 n-sec for overhead).
and, in steady state output is produced after every pipeline cycle. Here,
in this case 11 n-sec. After adding 1n-sec overhead, We will get 12 n-sec
of constant output producing cycle.

dividing 30/12 we get 2.5 
Question 6
A 5-stage pipelined processor has Instruction Fetch(IF),Instruction Decode(ID),Operand Fetch(OF),Perform Operation(PO)and Write Operand(WO)stages.The IF,ID,OF and WO stages take 1 clock cycle each for any instruction.The PO stage takes 1 clock cycle for ADD and SUB instructions,3 clock cycles for MUL instruction,and 6 clock cycles for DIV instruction respectively.Operand forwarding is used in the pipeline.What is the number of clock cycles needed to execute the following sequence of instructions?
     Instruction           Meaning of instruction
  I0 :MUL R2 ,R0 ,R1	      R2 ¬ R0 *R1
  I1 :DIV R5 ,R3 ,R4  	      R5 ¬ R3/R4
  I2 :ADD R2 ,R5 ,R2	      R2 ¬ R5+R2
  I3 :SUB R5 ,R2 ,R6	      R5 ¬ R2-R6
Cross
13
Tick
15
Cross
17
Cross
19


Question 6-Explanation: 
Operand Forwarding : In this technique the value of operand is given to the concerned stage of dependent instruction before it is stored. In the above question, I2 is dependent on I0 and I1, and I3 is dependent on I2. Let's see this question with a time-space diagram. Pipelining The above is a space-time diagram representing the pipeline in which the instructions gets executed. Instruction 0 is a MUL operation which take 3 clock cycles of CPU in the PO stage, and at any other stage it takes only 1 cycle. Instruction 1 is a DIV operation which take 6 clock cycles of CPU in the PO stage, and at any other stage it takes only 1 cycle. It can be noticed here that even when the OF stage was free in the 4th clock cycle, then also the instruction 1 was not given to it. This is a design issue. The operands should be fetched only if they are going to get operated or executed in the next cycle, else there is a possibility of data corruption. As PO stage was not free in the next cycle hence OF was delayed and was done for instruction 1 only just before 1 cycle of going to PO stage. Instruction 2 is an ADD operation which take 1 clock cycles of CPU in all stages. But it is a dependent operation. it needs the operands which are provided by Instruction 0 and 1. Instruction 2 needs R5 and R2 to add, it gets R2 on time, because till the time Instruction 2 reaches its PO stage R2 would have been stored in memory. Now R5 is also needed, but Instruction 2's PO and Instruction 1's WO are in parallel. That means Instruction 2 can't take the value of R5 before it is stored by Instruction 1. So here comes the concept of Operand Forwarding. Before Instruction 1 store it's result/value which is R5, it can first forward it to instruction 2's Fetch-Execute Buffer, so that Instruction 2 can also use it in parallel to Instruction's WO stage. This will save extra clock cycles required( if Operand forwarding is not used, and R5 need to be taken from memory). In Instruction 3, same operand forwarding concept is applied for the value of R2 which is computed by Instruction 2. Hence operand forwarding saved 2 extra clock cycles here. ( 1 cycle in Instruction 2 and 1 cycle in Instruction 3). So the total no of cycles are 15, which can be seen from the diagram, each instance of the stage represents 1 clock cycle. So total 15.
Question 7

The program below uses six temporary variables a, b, c, d, e, f. 
 

 
    a = 1
    b = 10
    c = 20
    d = a+b
    e = c+d
    f = c+e
    b = c+e
    e = b+f
    d = 5+e
    return d+f


Assuming that all operations take their operands from registers, what is the minimum number of registers needed to execute this program without spilling?
 

Cross

2
 

Tick

3
 

Cross

4
 

Cross

6
 



Question 7-Explanation: 

All of the given expressions use at-most 3 variables, so we never need more than 3 registers. 

See  http://en.wikipedia.org/wiki/Register_allocation 

It requires minimum of 3 registers. 

Principle of Register Allocation: If a variable needs to be allocated to a register, the system checks for any free register available, if it finds one, it allocates. If there is no free register, then it checks for a register that contains a dead variable ( a variable whose value is not going to be used in the future ), and if it finds one then it allocates. Otherwise, it goes for Spilling ( it checks for a register whose value is needed after the longest time, saves its value into the memory, and then use that register for current allocation, later when the old value of the register is needed, the system gets it from the memory where it was saved and allocate it in any register which is available ). 

But here we should not apply spilling as directed in the question. 

Let's allocate the registers for the variables. 

a = 1 ( let's say register R1 is allocated for variable 'a' ) 

  

b = 10 ( R2 for 'b' , because value of 'a' is going to be used in the future, hence can not replace variable of 'a' by that of 'b' in R1) 

  

c = 20 ( R3 for 'c', because values of 'a' and 'b' are going to be used in the future, hence can not replace variable 'a' or 'b' by 'c' in R1 or R2 respectively) 

  

d = a+b ( now, 'd' can be assigned to R1 because R1 contains a dead variable which is 'a' and it is so-called because it is not going to be used in future, i.e. no subsequent expression uses the value of variable 'a') 

  

e = c+d ( 'e' can be assigned to R1, because currently R1 contains value of variable 'd' which is not going to be used in the subsequent expression.) 

Note: an already calculated value of a variable is used only by READ operation ( not WRITE), hence we have to see only on the RHS side of the subsequent expressions whether the variable is going to be used or not. 

  

f = c+e ( ' f ' can be assigned to R2, because value of 'b' in register R2 is not going to be used in subsequent expressions, hence R2 can be used to allocate for ' f ' replacing 'b' ) 

  

b = c+e ( ' b ' can be assigned to R3, because value of 'c' in R3 is not being used later ) 

  

e = b+f ( here 'e' is already in R1, so no allocation here, direct assignment ) 

  

d = 5+e ( 'd' can be assigned to either R1 or R3, because values in both are not used further, let's assign in R1 ) 

  

return d+f ( no allocation here, simply contents of registers R1 and R2 are added and returned) 

  

hence we need only 3 registers, R1 R2 and R3. 

 
 

Question 8
Consider a 4 stage pipeline processor.   The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:
S1
S2
S3
S4
I1
2
1
1
1
I2
1
3
2
2
I3
2
1
1
3
I4
1
2
2
2
What is the number of cycles needed to execute the following loop? For (i=1 to 2) {I1; I2; I3; I4;}
Cross
16
Tick
23
Cross
28
Cross
30


Question 8-Explanation: 
This question is different from other questions on pipeline with respect to the no of cycles taken by each instruction in each stage, i.e. an instruction here may take different no of cycles in different stages, and also that two instructions may take different no of cycles in the same stage as well. Therefore, here we have to consider two things : 1) Eligibility 2) Availability i.e. an instruction i should be eligible to be given to stage j, and a stage j should be available(free) to handle/process instruction i. Now, let's see how both the above things can be achieved. An instruction i will be eligible to be given to stage j, if and only if, the instruction i has completed stage j-1. Similarly, a Stage j will be available for instruction i, if and only if, the Stage j has completed instruction i-1. So, by following and fulfilling above two criteria we have to determine the total no of cycles taken by these instructions in a loop of 2 iterations. Note: An instruction i will be eligible for processing in iteration 2, if and only if, it has completed its processing in iteration 1. 1st iteration 2nd iteration
Question 9
Which of the following is/are true of the auto-increment addressing mode?
I.  It is useful in creating self-relocating code.
II. If it is included in an Instruction Set Architecture, 
    then an additional ALU is required for effective address 
    calculation.
III.The amount of increment depends on the size of the data
     item accessed.
Cross
I only
Cross
II only
Tick
III Only
Cross
II and III only


Question 9-Explanation: 
In auto-increment addressing mode the address where next data block to be stored is generated automatically depending upon the size of single data item required to store. Self relocating code takes always some address in memory and statement says that this mode is used for self relocating code so option 1 is incorrect and  no additional ALU is required So option (C) is correct option.
Question 10
Which of the following must be true for the RFE (Return from Exception) instruction on a general purpose processor?
I.   It must be a trap instruction
II.  It must be a privileged instruction
III. An exception cannot be allowed to occur during 
     execution of an RFE instruction 
Cross
I only
Cross
II only
Cross
I and II only
Tick
I, II and III only


Question 10-Explanation: 
RFE (Return From Exception) is a privileged trap instruction that is executed when exception occurs, so an exception is not allowed to execute. In computer architecture for a general purpose processor, an exception can be defined as an abrupt transfer of control to the operating system. Exceptions are broadly classified into 3 main categories: a. Interrupt: it is mainly caused due to I/O device. b. Trap: It is caused by the program making a syscall. c. Fault: It is accidentally caused by the program that is under execution such as( a divide by zero, or null pointer exception etc). The processor’s fetch instruction unit makes a poll for the interrupts. If it finds something unusual happening in the machine operation it inserts an interrupt pseudo- instruction in the pipeline in place of the normal instruction. Then going through the pipeline it starts handling the interrupts. The operating system explicitly makes a transition from kernel mode to user mode, generally at the end of an interrupt handle pr kernel call by using a privileged instruction RFE( Return From Exception) instruction. This solution is contributed by Namita Singh
There are 94 questions to complete.
  • Last Updated : 19 Nov, 2018

Similar Reads