Aim of today's class is to deepen knowledge about cache, to clarify and to show example of time and spatial locality principle. Try to optimize cache parameters.
Consider following example:
#define s0 $16
#define s1 $17
#define s2 $18
.globl array
.data
.align 2
array:
.word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
.text
.globl start
.ent start
start:
la s0, array // load address of the first value in array to register s0
addi s1, $0, 3 // Number of cycle repetitions (number of iterations)
nop // Why nop here? Try to remove it. Will it have any influence on hit rate?
loop:
beq s1,$0, end
lw s2, 0(s0) // Read 0th value in array
lw s2, 4(s0) // Read 1st value in array (4/4 = 1)
lw s2, 36(s0) // Read 9th value (36/4=9)
lw s2, 40(s0) // Read 10th value (40/4=10)
addi s1, s1, -1
j loop
end:
nop
end_loop: //Final infinite loop
j end_loop
Now consider instruction and data cache with size of 8 words (1 word = 4B) each. The cache is word-aligned – it means that address of the first byte in word is divisible by 4. Replacement policy is LRU (lease recently used). Fill-in following table for instruction and data cache. Study in details execution of the above program and study accesses into memory and cache.
Data cache:
| Directly mapped | N-way set associative | Fully asociative | |||||
|---|---|---|---|---|---|---|---|
| Size/Block size/Blocks in sets: | 8/1/1 | 8/2/1 | 8/8/1 | 8/2/2 | 8/4/2 | 8/2/4 | 8/1/8 |
| Hit Count: | |||||||
| Miss Count: | |||||||
| Hit Rate: | |||||||
Instruction cache:
| Directly mapped | N-way set associative | Fully asociative | ||||||
|---|---|---|---|---|---|---|---|---|
| Size/Block size/Blocks in sets: | 8/1/1 | 8/2/1 | 8/8/1 | 8/2/2 | 8/4/2 | 8/2/4 | 8/1/8 | |
| Hit Count: | ||||||||
| Miss Count: | ||||||||
| Hit Rate: | ||||||||
Take the program from the last class (insertion sort) and cache with size of 8 words. Is there any replacement policy which you expect to perform better that others? Consider both data and instruction cache. Verify your conclusions with simulation.