Aim of today's class is to deepen knowledge about cache, to clarify and to show example of time and spatial locality principle. Try to optimize cache parameters.
Consider following example:
#define s0 $16 #define s1 $17 #define s2 $18 .globl array .data .align 2 array: .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 .text .globl start .ent start start: la s0, array // load address of the first value in array to register s0 addi s1, $0, 3 // Number of cycle repetitions (number of iterations) nop // Why nop here? Try to remove it. Will it have any influence on hit rate? loop: beq s1,$0, end lw s2, 0(s0) // Read 0th value in array lw s2, 4(s0) // Read 1st value in array (4/4 = 1) lw s2, 36(s0) // Read 9th value (36/4=9) lw s2, 40(s0) // Read 10th value (40/4=10) addi s1, s1, -1 j loop end: nop end_loop: //Final infinite loop j end_loop
Now consider instruction and data cache with size of 8 words (1 word = 4B) each. The cache is word-aligned – it means that address of the first byte in word is divisible by 4. Replacement policy is LRU (lease recently used). Fill-in following table for instruction and data cache. Study in details execution of the above program and study accesses into memory and cache.
Data cache:
Directly mapped | N-way set associative | Fully asociative | |||||
---|---|---|---|---|---|---|---|
Size/Block size/Blocks in sets: | 8/1/1 | 8/2/1 | 8/8/1 | 8/2/2 | 8/4/2 | 8/2/4 | 8/1/8 |
Hit Count: | |||||||
Miss Count: | |||||||
Hit Rate: |
Instruction cache:
Directly mapped | N-way set associative | Fully asociative | ||||||
---|---|---|---|---|---|---|---|---|
Size/Block size/Blocks in sets: | 8/1/1 | 8/2/1 | 8/8/1 | 8/2/2 | 8/4/2 | 8/2/4 | 8/1/8 | |
Hit Count: | ||||||||
Miss Count: | ||||||||
Hit Rate: |
Take the program from the last class (insertion sort) and cache with size of 8 words. Is there any replacement policy which you expect to perform better that others? Consider both data and instruction cache. Verify your conclusions with simulation.