====== 4. Hierarchical memory model II - cache, additional class ======
  * for lecturer: [[..:..:..:internal:tutorials:04:start|tutorial 4]]
===== Content of the class =====

Aim of today's class is to deepen knowledge about cache, to clarify and to show example of time and spatial locality principle. Try to optimize cache parameters.

===== Example A =====

Consider following example:

<code>
#define s0 $16
#define s1 $17
#define s2 $18

.globl    array
.data
.align    2

array:
.word    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
.text
.globl start
.ent start

start:

la   s0, array      // load address of the first value in array to register s0
addi s1, $0, 3      // Number of cycle repetitions (number of iterations)
nop                 // Why nop here? Try to remove it. Will it have any influence on hit rate?

loop:
    beq s1,$0, end
    lw s2, 0(s0)      // Read 0th value in array
    lw s2, 4(s0)      // Read 1st value in array (4/4 = 1)

    lw s2, 36(s0)     // Read 9th value (36/4=9)
    lw s2, 40(s0)     // Read 10th value (40/4=10)
   
    addi s1, s1, -1
    j loop

end:
nop  

end_loop:           //Final infinite loop
    j end_loop

</code>

Now consider instruction and data cache with **size of 8 words** (1 word = 4B) **each**. The cache is word-aligned -- it means that address of the first byte in word is divisible by 4. Replacement policy is LRU (lease recently used). Fill-in following table for instruction and data cache. **Study in details execution of the above program and study accesses into memory and cache**.

** Data cache: **
^ ^  Directly mapped ^^^  N-way set associative  ^  Fully asociative  ^^^
^  Size/Block size/Blocks in sets: ^  8/1/1  ^  8/2/1  ^  8/8/1  ^  8/2/2  ^  8/4/2  ^  8/2/4  ^  8/1/8  ^
| Hit Count:  | | | | | | | |
| Miss Count:  | | | | | | | |
| Hit Rate:  | | | | | | | |
\\

** Instruction cache: **
^ ^  Directly mapped ^^^  N-way set associative  ^  Fully asociative  ^^^^
^  Size/Block size/Blocks in sets: ^  8/1/1  ^  8/2/1  ^  8/8/1  ^  8/2/2  ^  8/4/2  ^  8/2/4  ^  8/1/8  ^
| Hit Count:  | | | | | | | |
| Miss Count:  | | | | | | | |
| Hit Rate:  | | | | | | | |
\\

===== Questions =====
  * What are your conclusions? Which cache has the best hit rate?
  * Is the fully associative cache always the best?
  * Does the hit rate depend on executed program or is it parameter of the cache?
  * Is it possible to have data in cache that will never be accessed? (Data we are not working with, instructions which are never executed,, etc...)
  * Is there optimal value for **Block size** parameter? Is there any rule for the value of **Block size**?
  * Find all information stored in the cache.
  * Find type of instruction and data cache use your computer? Find parameters - size, block size, blocks in set.
<note>
Directly mapped cache have one block in set. Fully associative cache has as many ways as it has blocks.
</note>

===== Example B =====

Take the program from the last class (insertion sort) and cache with size of 8 words. Is there any replacement policy which you expect to perform better that others? Consider both data and instruction cache. Verify your conclusions with simulation.

===== Questions =====
  * Try the best parameters from Example A. Do these the parameters achieve the best hit ratio here as well?
  * Do you know another replacement policies different from those in MIPS Simulator? Try to find any on the internet or in the lectures.
  * Which write policy you expect to have the lowest impact on memory bus. (Transmits the lowest size of data).