====== 4. Hierarchical memory model, cache ====== * for lecturer: [[..:..:..:internal:tutorials:04:start|tutorial 4]] ===== Exercise outline ===== - Introductory example - Basic cache design and principle ===== What should I repeat before the tutorial ===== - Repeat how the cache works - Repeat meaning of: - fully associative cache - N-way set associative cache - direct-mapped cache - Repeat what write-through and write-back means. - Finish the bubble sort from the last exercise. ===== Content of the class ===== === Introductory example === How long (in ms) it will take to execute a program with 1000 instructions on a 1GHz processor? Assume that one instruction lasts one clock. How long it will take to execute the program on the same processor when 20% of instructions are memory accesses. For simplicity assume that one memory access lasts 70ns. What is the speedup of the program (with 1000 instructions) when we replace the previous processor with a 2GHz version. The memory access time remains the same. How long it will take to execute the program with old 1GHz processor and we will add a cache. This cache has 80% hit rate and search for value in cache take 5ns. ===== Cache ===== In the following part we will work with 12 bit address, data are 8bits wide. ==== Directly mapped cache ==== - Plot directly mapped cache with 4 records. - Explain how to find correct record. How many bits do we need for tag and for record address. == Example == Run the Mips simulator. Diplay the data cache. And select from menu //Edit// > //Cache/Mem Config// and select //Data cache// tab. Set parameters of the cache as shown on the figure bellow. {{courses:B35APO:tutorials:04:data_cache-direct_mapped.png|}} Check you solution of the cache with the figure generated by the Mips. Now open the MipsIt development environment and compile following program (for sorting numbers using the insertion sort). Load the compiled program into the Mips simulator. Examine content of the cache and memory after each step. Answer following questions: - Find Hit and Miss count. - How many memory accesses the program executes? - Change the cache to have 8 records. Check how this change affects the hit/miss count. #define s0 $16 #define s1 $17 #define s2 $18 #define s3 $19 #define s4 $20 #define s5 $21 .globl array .data .align 2 array: .word 5, 3, 4, 1, 15, 8, 14, 2, 10, 6, 11, 13, 7, 9, 12 .text .globl start .ent start start: addi s0, $0, 0 //Minimum value from the rest of the array will be placed here. (Offset in the array, increasing by 4 bytes). addi s1, $0, 60 //Maximal index/offset value. Used for cycle termination = number of values in array * 4. add s2, $0, s0 //Working position (offset) // s3 - offset of the smallest value found so far in given run // s4 - value of the smallest value found so far in given run // s5 - temporary main_cycle: beq s0, s1, main_cycle_end lw s4, array(s0) add s3, s0, $0 add s2, s0, $0 inner_cycle: beq s2, s1, inner_cycle_end lw s5, array(s2) bgt s5, s4, not_minimum addi s3, s2, 0 addi s4, s5, 0 not_minimum: addi s2, s2, 4 j inner_cycle inner_cycle_end: lw s5, array(s0) sw s4, array(s0) sw s5, array(s3) addi s0, s0, 4 j main_cycle main_cycle_end: //Final infinite loop end_loop: j end_loop nop .end start Hint: If you need to execute long program and you are interested only in the final results, you can use //Run// button to execute instructions automatically. To stop the automatic execution, use the //Stop// button. If you want to be sure that the processor will not execute irrelevant instructions at the end of the program, you have to add infinite loop at the end. Please remember that in real you do not have any assurance that the memory is set to zeros (NOP instructions). More likely it will contain some rubbish that can be interpreted as instructions and executed. ==== Fully associative cahce ==== - Explain differences between fully associative cache and directly mapped cache. - Draw a fully associative cache for 4 records (12 bit address and 8 bit data). - Explain how to find a place for the given address. == Example == In Mips set parameters of the cache as shown on the figure bellow. (Size 4, Block size 1 a Blocks in sets 4). {{courses:B35APO:tutorials:04:data_cache-fully_associative.png|}} Run the program again with new cache parameters and check cache performance. - Find Hit and Miss count. - How many memory accesses the program executes? - Change the cache to have 8 records. Check how this change affects the hit/miss count. ==== N-way set associative cache ==== - Explain differences between this and previous caches. Are there any advantages over above versions of cache? - Draw a 2-way set associative cache. There are 4 records in cache (12 bit address and 8 bit data). - Explain how to find a place for the given address. == Example == In Mips set parameters of the cache as shown on the figure bellow. (Size 4, Block size 1 a Blocks in sets 2). {{courses:B35APO:tutorials:04:data_cache-reduced_associativity.png|}} Run the program again with new cache parameters and check cache performance. - Find Hit and Miss count. - How many memory accesses the program executes? - Change the cache to have 8 records. Check how this change affects the hit/miss count.