Search
.globl _start .set noat .set noreorder .text _start: main: addi $2, $0, 10 add $11, $0, $2 // A : $11<-$2 add $12, $0, $2 // B : $12<-$2 add $13, $0, $2 // C : $13<-$2 // la $5, varx // $5 = (byte*) &varx; // The macro-instruction la is compiled as two following instructions: lui $5, %hi(varx) // load the upper part of address ori $5, $5, %lo(varx) // append the lower part of address lw $1, 0($5) // $1 = *((int*)$5); add $15, $0, $1 // D : $15<-$1 add $16, $0, $1 // E : $16<-$1 add $17, $0, $1 // F : $17<-$1 loop: break beq $0, $0, loop nop .data varx: .word 1
Trace program step by step:
Remark. Data and instruction cache are not important, both can be disabled.
Observe and analyze not only results stored in registers but even possible stall states and control signals if hazard unit is activated.
Number of required cycles can be read in bottom right corner of CPU window.
Question to analyze: If QtMips requires more cycles to execute program when pipeline is enabled than if executed without pipeline, does it mean that pipelined processor is generally slower?
Design enhancement: Try to modify program to better utilize pipelined execution. Is it possible to decrease number of stalls or even achieve state when it can be executed with expected results if hazard unit is switched off?
Write a code for calculation of N-th Fibonacci number (for N > 2). Fibonacci sequence is defined as follows:
F(n) = F(n-1) + F(n-2), for n > 2, and F(1) = 0, F(2) = 1.
Here is the first few numbers in the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144,…
To the calculated Fibonacci number (for instructional purposes) add 15. In your program you may use following instructions:
Possible solution in C:
t0 = 5; // Set value of N s0 = 0; // F(0) s1 = 1; // F(1) for(t1 = 2; t1 <= t0; t1++) { t2 = s0 + s1; s0 = s1; s1 = t2; } s1 += 15; while(1) ; // Endless loop
Template:
#define t0 $8 #define t1 $9 #define t2 $10 #define s0 $16 #define s1 $17 #define s2 $18 .globl start .set noat .set noreorder .ent start start: // Here, there is the place for your code nop .end start
Debug your code for Mips simulator and then make your code work in MipsPipeS simulator.
Note how the delay slots are handled. They filled automatically by compiler, who will fill in following instruction:
This behavior of the compiler can be turned off by following pseudoinstruction:
.set noreorder
Compile your code with this pseudoinstruction, try to execute your code in the QtMips simulator without pipeline and observe the differences. Modify your code for the pipelined version of processor with hazard unit disabled in such way, that it will produce the same value as on processor without pipeline.
Try to find out rules for the compiler, with which the compiler will produce the program without data and control hazards - program will have the same results as in Mips simulator (without pipeline).
Modify your code to write the result (F(N) + 15) to memory on address 0x02 (using sw instruction) and then read the value back into a register (using lw instruction). Execute your program in MipsPipeS and MipsPipeXL simulators. Observe the execution closely, namely the sw and lw instructions.
sw
lw