Pipelining and Addressing modes


Question 1
Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I1, I2, I3, …, I12 is executed in this pipelined processor. Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is
A
132
B
165
C
176
D
328
GATE CS 2013    Computer Organization and Architecture    Pipelining and Addressing modes    
Discuss it


Question 1 Explanation: 
Pipeline will have to be stalled till Ei stage of l4 completes, 
as Ei stage will tell whether to take branch or not. 

After that l4(WO) and l9(Fi) can go in parallel and later the
following instructions.
So, till l4(Ei) completes : 7 cycles * (10 + 1 ) ns = 77ns
From l4(WO) or l9(Fi) to l12(WO) : 8 cycles * (10 + 1)ns = 88ns
Total = 77 + 88 = 165 ns
Question 2
Register renaming is done in pipelined processors
A
as an alternative to register allocation at compile time
B
for efficient access to function parameters and local variables
C
to handle certain kinds of hazards
D
as part of address translation
GATE CS 2012    Computer Organization and Architecture    Pipelining and Addressing modes    
Discuss it


Question 2 Explanation: 
Register renaming is done to avoid data hazards
Question 3
Consider a hypothetical processor with an instruction of type LW R1, 20(R2), which during execution reads a 32-bit word from memory and stores it in a 32-bit register R1. The effective address of the memory location is obtained by the addition of a constant 20 and the contents of register R2. Which of the following best reflects the addressing mode implemented by this instruction for operand in memory?
A
Immediate Addressing
B
Register Addressing
C
Register Indirect Scaled Addressing
D
Base Indexed Addressing
GATE CS 2011    Computer Organization and Architecture    Pipelining and Addressing modes    
Discuss it


Question 4
Consider evaluating the following expression tree on a machine with load-store architecture in which memory can be accessed only through load and store instructions. The variables a, b, c, d and e initially stored in memory. The binary operators used in this expression tree can be evaluate by the machine only when the operands are in registers. The instructions produce results only in a register. If no intermediate results can be stored in memory, what is the minimum number of registers needed to evaluate this expression?   gate2011Q26
A
2
B
9
C
5
D
3
GATE CS 2011    Computer Organization and Architecture    Pipelining and Addressing modes    
Discuss it


Question 4 Explanation: 

R1←c,  R2←d,  R2←R1+R2,  R1←e,  R2←R1-R2
Now to calculate the rest of the expression we must load a and b into the registers but we need the
content of R2 later.
So we must use another Register.
R1←a, R3←b, R1←R1-R3, R1←R1+R2

Source: http://clweb.csa.iisc.ernet.in/rahulsharma/gate2011key.html

Question 5
Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure: GATECS2011Q41 What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?
A
4.0
B
2.5
C
1.1
D
3.0
GATE CS 2011    Computer Organization and Architecture    Pipelining and Addressing modes    
Discuss it


Question 5 Explanation: 
Pipeline registers overhead is not counted in normal 
time execution

So the total count will be

5+6+11+8= 30 [without pipeline]

Now, for pipeline, each stage will be of 11 n-sec (+ 1 n-sec for overhead).
and, in steady state output is produced after every pipeline cycle. Here,
in this case 11 n-sec. After adding 1n-sec overhead, We will get 12 n-sec
of constant output producing cycle.

dividing 30/12 we get 2.5 
Question 6
A 5-stage pipelined processor has Instruction Fetch(IF),Instruction Decode(ID),Operand Fetch(OF),Perform Operation(PO)and Write Operand(WO)stages.The IF,ID,OF and WO stages take 1 clock cycle each for any instruction.The PO stage takes 1 clock cycle for ADD and SUB instructions,3 clock cycles for MUL instruction,and 6 clock cycles for DIV instruction respectively.Operand forwarding is used in the pipeline.What is the number of clock cycles needed to execute the following sequence of instructions?
     Instruction           Meaning of instruction
  I0 :MUL R2 ,R0 ,R1	      R2 ¬ R0 *R1
  I1 :DIV R5 ,R3 ,R4  	      R5 ¬ R3/R4
  I2 :ADD R2 ,R5 ,R2	      R2 ¬ R5+R2
  I3 :SUB R5 ,R2 ,R6	      R5 ¬ R2-R6
A
13
B
15
C
17
D
19
GATE CS 2010    Computer Organization and Architecture    Pipelining and Addressing modes    
Discuss it


Question 6 Explanation: 
Operand Forwarding : In this technique the value of operand is given to the concerned stage of dependent instruction before it is stored. In the above question, I2 is dependent on I0 and I1, and I3 is dependent on I2. Let's see this question with a time-space diagram. Pipelining The above is a space-time diagram representing the pipeline in which the instructions gets executed. Instruction 0 is a MUL operation which take 3 clock cycles of CPU in the PO stage, and at any other stage it takes only 1 cycle. Instruction 1 is a DIV operation which take 6 clock cycles of CPU in the PO stage, and at any other stage it takes only 1 cycle. It can be noticed here that even when the OF stage was free in the 4th clock cycle, then also the instruction 1 was not given to it. This is a design issue. The operands should be fetched only if they are going to get operated or executed in the next cycle, else there is a possibility of data corruption. As PO stage was not free in the next cycle hence OF was delayed and was done for instruction 1 only just before 1 cycle of going to PO stage. Instruction 2 is an ADD operation which take 1 clock cycles of CPU in all stages. But it is a dependent operation. it needs the operands which are provided by Instruction 0 and 1. Instruction 2 needs R5 and R2 to add, it gets R2 on time, because till the time Instruction 2 reaches its PO stage R2 would have been stored in memory. Now R5 is also needed, but Instruction 2's PO and Instruction 1's WO are in parallel. That means Instruction 2 can't take the value of R5 before it is stored by Instruction 1. So here comes the concept of Operand Forwarding. Before Instruction 1 store it's result/value which is R5, it can first forward it to instruction 2's Fetch-Execute Buffer, so that Instruction 2 can also use it in parallel to Instruction's WO stage. This will save extra clock cycles required( if Operand forwarding is not used, and R5 need to be taken from memory). In Instruction 3, same operand forwarding concept is applied for the value of R2 which is computed by Instruction 2. Hence operand forwarding saved 2 extra clock cycles here. ( 1 cycle in Instruction 2 and 1 cycle in Instruction 3). So the total no of cycles are 15, which can be seen from the diagram, each instance of the stage represents 1 clock cycle. So total 15.
Question 7
The program below uses six temporary variables a, b, c, d, e, f.
 
    a = 1
    b = 10
    c = 20
    d = a+b
    e = c+d
    f = c+e
    b = c+e
    e = b+f
    d = 5+e
    return d+f
Assuming that all operations take their operands from registers, what is the minimum number of registers needed to execute this program without spilling?
A
2
B
3
C
4
D
6
GATE CS 2010    Computer Organization and Architecture    Pipelining and Addressing modes    
Discuss it


Question 7 Explanation: 
All of the given expressions use at-most 3 variables, so we never nee more than 3 registers. See  http://en.wikipedia.org/wiki/Register_allocation It requires minimum 3 registers. Principle of Register Allocation : If a variable needs to be allocated to a register, the system checks for any free register available, if it finds one, it allocates. If there is no free register, then it checks for a register that contains a dead variable ( a variable whose value is not going to be used in future ), and if it finds one then it allocates. Otherwise it goes for Spilling ( it checks for a register whose value is needed after the longest time, saves its value into the memory, and then use that register for current allocation, later when the old value of the register is needed, the system gets it from the memory where it was saved and allocate it in any register which is available ). But here we should not apply spilling as directed in the question. Let's allocate the registers for the variables. a = 1 ( let's say register R1 is allocated for variable 'a' )   b = 10 ( R2 for 'b' , because value of 'a' is going to be used in the future, hence can not replace variable of 'a' by that of 'b' in R1)   c = 20 ( R3 for 'c', because values of 'a' and 'b' are going to be used in the future, hence can not replace variable 'a' or 'b' by 'c' in R1 or R2 respectively)   d = a+b ( now, 'd' can be assigned to R1 because R1 contains dead variable which is 'a' and it is so called because it is not going to be used in future, i.e. no subsequent expression uses the value of variable 'a')   e = c+d ( 'e' can be assigned to R1, because currently R1 contains value of varibale 'd' which is not going to be used in the subsequent expression.) Note: an already calculated value of a variable is used only by READ operation ( not WRITE), hence we have to see only on the RHS side of the subsequent expressions that whether the variable is going to be used or not.   f = c+e ( ' f ' can be assigned to R2, because vaule of 'b' in register R2 is not going to be used in subsequent expressions, hence R2 can be used to allocate for ' f ' replacing 'b' )   b = c+e ( ' b ' can be assigned to R3, because value of 'c' in R3 is not being used later )   e = b+f ( here 'e' is already in R1, so no allocation here, direct assignment )   d = 5+e ( 'd' can be assigned to either R1 or R3, because values in both are not used further, let's assign in R1 )   return d+f ( no allocation here, simply contents of registers R1 and R2 are added and returned)   hence we need only 3 registers, R1 R2 and R3.  
Question 8
Consider a 4 stage pipeline processor.   The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:
S1
S2
S3
S4
I1
2
1
1
1
I2
1
3
2
2
I3
1
1
1
3
I4
1
2
2
2
What is the number of cycles needed to execute the following loop? For (i=1 to 2) {I1; I2; I3; I4;}
A
16
B
23
C
28
D
30
GATE-CS-2009    Computer Organization and Architecture    Pipelining and Addressing modes    
Discuss it


Question 8 Explanation: 
This question is different from other questions on pipeline with respect to the no of cycles taken by each instruction in each stage, i.e. an instruction here may take different no of cycles in different stages, and also that two instructions may take different no of cycles in the same stage as well. Therefore, here we have to consider two things : 1) Eligibility 2) Availability i.e. an instruction i should be eligible to be given to stage j, and a stage j should be available(free) to handle/process instruction i. Now, let's see how both the above things can be achieved. An instruction i will be eligible to be given to stage j, if and only if, the instruction i has completed stage j-1. Similarly, a Stage j will be available for instruction i, if and only if, the Stage j has completed instruction i-1. So, by following and fulfilling above two criteria we have to determine the total no of cycles taken by these instructions in a loop of 2 iterations. Note: An instruction i will be eligible for processing in iteration 2, if and only if, it has completed its processing in iteration 1. 1st iteration 2nd iteration
Question 9
Which of the following is/are true of the auto-increment addressing mode?
I.  It is useful in creating self-relocating code.
II. If it is included in an Instruction Set Architecture, 
    then an additional ALU is required for effective address 
    calculation.
III.The amount of increment depends on the size of the data
     item accessed.
A
I only
B
II only
C
III Only
D
II and III only
Computer Organization and Architecture    GATE CS 2008    Pipelining and Addressing modes    
Discuss it


Question 9 Explanation: 
In auto-increment addressing mode the address where next data block to be stored is generated automatically depending upon the size of single data item required to store. Self relocating code takes always some address in memory and statement says that this mode is used for self relocating code so option 1 is incorrect and  no additional ALU is required So option (C) is correct option.
Question 10
Which of the following must be true for the RFE (Return from Exception) instruction on a general purpose processor?
I.   It must be a trap instruction
II.  It must be a privileged instruction
III. An exception cannot be allowed to occur during 
     execution of an RFE instruction 
A
I only
B
II only
C
I and II only
D
I, II and III only
Computer Organization and Architecture    GATE CS 2008    Pipelining and Addressing modes    
Discuss it


Question 10 Explanation: 
RFE (Return From Exception) is a privileged trap instruction that is executed when exception occurs, so an exception is not allowed to execute. In computer architecture for a general purpose processor, an exception can be defined as an abrupt transfer of control to the operating system. Exceptions are broadly classified into 3 main categories: a. Interrupt: it is mainly caused due to I/O device. b. Trap: It is caused by the program making a syscall. c. Fault: It is accidentally caused by the program that is under execution such as( a divide by zero, or null pointer exception etc). The processor’s fetch instruction unit makes a poll for the interrupts. If it finds something unusual happening in the machine operation it inserts an interrupt pseudo- instruction in the pipeline in place of the normal instruction. Then going through the pipeline it starts handling the interrupts. The operating system explicitly makes a transition from kernel mode to user mode, generally at the end of an interrupt handle pr kernel call by using a privileged instruction RFE( Return From Exception) instruction. This solution is contributed by Namita Singh
There are 90 questions to complete.


My Personal Notes arrow_drop_up