Search
A. How long (in ms) it will take to execute a program with 1000 instructions on a 1GHz processor? Assume that one instruction lasts one clock.
B. How long it will take to execute the program on the same processor when 20% of instructions are memory accesses. For simplicity assume that one memory access lasts 70ns.
C. What is the speedup of the program (with 1000 instructions) when we replace the previous processor with a 2GHz version. The memory access time remains the same.
D. How long it will take to execute the program with old 1GHz processor and we will add a cache. This cache has 80% hit rate and search for value in cache take 5ns.
In the following part we will work with 12 bit address, data are 8bits wide.
Run the QtRVSim simulator and select simple processor without cache in the basic setup (No pipline no cache).
Next step is to compile following program (it is available in /opt/apo/selection-sort directory). The program implements simple sorting algorithm (Selection-Sort) run on 15 integer numbers. Load program into QtRVsim simulator and open windows with memory access statistics (Window → program cache and Window → Data cache) and take a note about number of memory reads and writes as well as number of additional cycles required to access memory.
As a next step, configure simulator to use four words of directly mapped data and program memory.
The cache size 4 words = number of sets 4 x number of words in block 1 x degree of associativity 1
Measure results for as many combinations of cache parameters as possible while maintaining cache size 4 words (product of number of sets, block size and degree of associativity). Observe changes of the cache content and answer following questions:
// Simple sorting algorithm - selection sort // Directives to make interesting windows visible #pragma qtrvsim show registers #pragma qtrvsim show memory .option norelax .globl array .globl _start .text _start: la a0, array addi s0, zero, 0 //Minimum value from the rest of the array will be placed here. (Offset in the array, increasing by 4 bytes). addi s1, zero, 60 // Maximal index/offset value. Used for cycle termination = number of values in array * 4. add s2, zero, s0 //Working position (offset) // s3 - offset of the smallest value found so far in given run // s4 - value of the smallest value found so far in given run // s5 - temporary main_cycle: beq s0, s1, main_cycle_end add t0, a0, s0 lw s4, 0(t0) // lw s4, array(s0) add s3, s0, zero add s2, s0, zero inner_cycle: beq s2, s1, inner_cycle_end add t0, a0, s2 lw s5, 0(t0) // lw s5, array(s2) // expand bgt s5, s4, not_minimum slt t0, s4, s5 bne t0, zero, not_minimum addi s3, s2, 0 addi s4, s5, 0 not_minimum: addi s2, s2, 4 j inner_cycle inner_cycle_end: add t0, a0, s0 lw s5, 0(t0) // lw s5, array(s0) sw s4, 0(t0) // sw s4, array(s0) add t0, a0, s3 sw s5, 0(t0) // sw s5, array(s3) addi s0, s0, 4 j main_cycle main_cycle_end: //Final infinite loop end_loop: fence // flush cache memory ebreak // stop the simulator j end_loop .org 0x400 .data // .align 2 // not supported by QtRVSsim array: .word 5, 3, 4, 1, 15, 8, 9, 2, 10, 6, 11, 1, 6, 9, 12 // Specify location to show in memory window #pragma qtrvsim focus memory array
In QtRVSim set parameters of the cache as shown on the figure bellow. Size 4 word = number of sets 1 x block size 1 and degree of associativity 4.
Run the program again with new cache parameters and check cache performance.
In RISC-V set parameters of the cache as shown on the figure bellow. Size 4 words = number of sets 2 x block size 1 x degree of associativity 2.