====== 4. Hierarchical memory model II - cache, additional class ====== * for lecturer: [[..:..:..:internal:tutorials:04:start|tutorial 4]] ===== Content of the class ===== Aim of today's class is to deepen knowledge about cache, to clarify and to show example of time and spatial locality principle. Try to optimize cache parameters. ===== Example A ===== Consider following example: #define s0 $16 #define s1 $17 #define s2 $18 .globl array .data .align 2 array: .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 .text .globl start .ent start start: la s0, array // load address of the first value in array to register s0 addi s1, $0, 3 // Number of cycle repetitions (number of iterations) nop // Why nop here? Try to remove it. Will it have any influence on hit rate? loop: beq s1,$0, end lw s2, 0(s0) // Read 0th value in array lw s2, 4(s0) // Read 1st value in array (4/4 = 1) lw s2, 36(s0) // Read 9th value (36/4=9) lw s2, 40(s0) // Read 10th value (40/4=10) addi s1, s1, -1 j loop end: nop end_loop: //Final infinite loop j end_loop Now consider instruction and data cache with **size of 8 words** (1 word = 4B) **each**. The cache is word-aligned -- it means that address of the first byte in word is divisible by 4. Replacement policy is LRU (lease recently used). Fill-in following table for instruction and data cache. **Study in details execution of the above program and study accesses into memory and cache**. ** Data cache: ** ^ ^ Directly mapped ^^^ N-way set associative ^ Fully asociative ^^^ ^ Size/Block size/Blocks in sets: ^ 8/1/1 ^ 8/2/1 ^ 8/8/1 ^ 8/2/2 ^ 8/4/2 ^ 8/2/4 ^ 8/1/8 ^ | Hit Count: | | | | | | | | | Miss Count: | | | | | | | | | Hit Rate: | | | | | | | | \\ ** Instruction cache: ** ^ ^ Directly mapped ^^^ N-way set associative ^ Fully asociative ^^^^ ^ Size/Block size/Blocks in sets: ^ 8/1/1 ^ 8/2/1 ^ 8/8/1 ^ 8/2/2 ^ 8/4/2 ^ 8/2/4 ^ 8/1/8 ^ | Hit Count: | | | | | | | | | Miss Count: | | | | | | | | | Hit Rate: | | | | | | | | \\ ===== Questions ===== * What are your conclusions? Which cache has the best hit rate? * Is the fully associative cache always the best? * Does the hit rate depend on executed program or is it parameter of the cache? * Is it possible to have data in cache that will never be accessed? (Data we are not working with, instructions which are never executed,, etc...) * Is there optimal value for **Block size** parameter? Is there any rule for the value of **Block size**? * Find all information stored in the cache. * Find type of instruction and data cache use your computer? Find parameters - size, block size, blocks in set. Directly mapped cache have one block in set. Fully associative cache has as many ways as it has blocks. ===== Example B ===== Take the program from the last class (insertion sort) and cache with size of 8 words. Is there any replacement policy which you expect to perform better that others? Consider both data and instruction cache. Verify your conclusions with simulation. ===== Questions ===== * Try the best parameters from Example A. Do these the parameters achieve the best hit ratio here as well? * Do you know another replacement policies different from those in MIPS Simulator? Try to find any on the internet or in the lectures. * Which write policy you expect to have the lowest impact on memory bus. (Transmits the lowest size of data).