====== 6. Pipeline and Hazards ====== * for lecturer: [[..:..:..:internal:tutorials:06:start|tutorial 6]] ===== Class outline ===== - Fibonacci sequence - Transcription C code to assembler - Simulation and debugging for processor without pipeline (Mips simulator). - Simulation and debugging for processor with pipeline (MipsPipeS simulator). ===== What should I know before the class ===== - To understand the lecture about pipeline and hazards. ===== Program to demonstrate pitfalls of pipeline execution ===== .globl _start .option norelax .text _start: main: addi x2, zero, 10 add x11, zero, x2 // A : x11<-x2 add x12, zero, x2 // B : x12<-x2 add x13, zero, x2 // C : x13<-x2 // la x5, varx // x5 = (byte*) &varx; // The macro-instruction la is compiled as two following instructions: lui x5, %hi(varx) // load the upper part of address ori x5, x5, %lo(varx) // append the lower part of address lw x1, 0(x5) // x1 = *((int*)x5); add x15, zero, x1 // D : x15<-x1 add x16, zero, x1 // E : x16<-x1 add x17, zero, x1 // F : x17<-x1 loop: ebreak beq zero, zero, loop nop .data varx: .word 1 Trace program step by step: - the first, on CPU with disabled pipeline, - then activate pipeline but left hazard unit switched off. Propose rules to execute program expected way. - Execute program on CPU with hazard unit with and without forwarding. //Remark. Data and instruction cache are not important, both can be disabled.// Observe and analyze not only results stored in registers but even possible stall states and control signals if hazard unit is activated. * When are instructions A, B, C, D, E and F results computed and stored into registers and when are results correct/respect instructions program order? * Mow many cycles are required to execute whole program? __Number of required cycles__ can be read in bottom right corner of CPU window. **Question to analyze**: If QtRVSim requires more cycles to execute program when pipeline is enabled than if executed without pipeline, does it mean that pipelined processor is generally slower? **Design enhancement**: Try to modify program to better utilize pipelined execution. Is it possible to decrease number of stalls or even achieve state when it can be executed with expected results if hazard unit is switched off? ===== What shall we do today? ===== Write a code for calculation of N-th Fibonacci number (for N > 2). Fibonacci sequence is defined as follows: F(n) = F(n-1) + F(n-2), for n > 2, and F(0) = 0, F(1) = 1. Here is the first few numbers in the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144,... In your program you may use following instructions: \\ Possible solution in C: t0 = 5; // Set value of N s0 = 0; // F(0) s1 = 1; // F(1) for(t1 = 2; t1 <= t0; t1++) { t2 = s0 + s1; s0 = s1; s1 = t2; } while(1) ; // Endless loop \\ Template: .globl start .option norelax start: // Here, there is the place for your code nop .end start \\ Debug your code for Mips simulator and then make your code work in MipsPipeS simulator. \\ \\ Compile your code with this pseudoinstruction, try to execute your code in the QtRVSim simulator without pipeline and observe the differences. Modify your code for the pipelined version of processor with hazard unit disabled in such way, that it will produce the same value as on processor without pipeline. Try to find out rules for the compiler, with which the compiler will produce the program without data and control hazards - program will have the same results as in Mips simulator (without pipeline). ===== For those with spare time ===== Modify your code to write the result (F(N) + 15) to memory on address 0x02 (using ''sw'' instruction) and then read the value back into a register (using ''lw'' instruction). Execute your program in MipsPipeS and MipsPipeXL simulators. Observe the execution closely, namely the ''sw'' and ''lw'' instructions. === Questions: === * Find out how the **add** instruction is executed. * Find out how the **addi** instruction is executed. * Find out how the **lw** instruction is executed. * Find out how the **sw** instruction is executed. * How many clocks does it take to find out the branch target address? And how I will find it out? (instructions beq a bne) \\ \\