Lectures

1. Lecture – Introduction into modern computer architectures

Control flow computers and Data flow computers (Data driven, Demand driven). Classification of computer architectures by Flynn’s taxonomy. Parallel processing – multi-core, multiprocessor and multiple computers based systems, the concept of parallel processing. Amdahl's and Gustafson's law. Performance metrics.
English: PDF: 01_introduction_b4m35pap-en.pdf ODP: 01_introduction_b4m35pap-en.odp
Czech: PDF: 01_introduction_b4m35pap.pdf ODP: 01_introduction_b4m35pap.odp

2. Lecture – From the scalar to the superscalar processors (Basic organization of superscalar processor)

Superscalar processors with static, dynamic, and hybrid scheduling of instructions execution.
English: PDF: 02_superskalar_organization_introduction_b4m35pap-en.pdf ODP: 02_superskalar_organization_introduction_b4m35pap-en.odp
Czech: PDF: 02_superskalarni_organizace_uvod_a4m36pap.pdf
You should already know: pipelining_a4m36pap.pdf

3. Lecture - Superscalar techniques I – Data flow inside the processor (register data flow)

Registers renaming (Tomasul algorithm) and data speculation. Precise exception support.
English: PDF: 03_superskalar_technics_data_flow_inside_processor_b4m35pap-en.pdf ODP: 03_superskalar_technics_data_flow_inside_processor_b4m35pap-en.odp
Czech: PDF: 03_superskalarni_techniky_tok_dat_uvnitr_procesoru_a4m36pap.pdf

4. Lecture - Superscalar techniques II - Instruction flow, speculative execution (Control Speculation)

Prediction, predictors and instructions prefetching. Static and dynamic predictions; Smith's predictor, two-level predictors with local and global history, bi-mode, adaptive branch prediction technique, and more. Branch misprediction recovery.
English: PDF: 04_superskalar_technics_instruction_prefetching_b4m35pap-en.pdf ODP: 04_superskalar_technics_instruction_prefetching_b4m35pap-en.odp
Czech: PDF: 04_superskalarni_techniky_spekulace_a_predikce_vetveni_a4m36pap.pdf

5. Lecture - Superscalar techniques III - Memory data flow, VLIW and EPIC processors

Data flow from / to memory. Load bypassing and Load forwarding. Speculative load. Some other ways to reduce memory latency. VLIW and EPIC processors. Use of data parallelism, SIMD and vector instruction in ISA. Loop unrolling and Software pipelining - Execution on WLIV and superscalar processor.
English: PDF: 05_superskalar_technics_memory_data_flow_vliw_and_epic_b4m35pap-en.pdf ODP: 05_superskalar_technics_memory_data_flow_vliw_and_epic_b4m35pap-en.odp
Czech: PDF: 05_superskalarni_techniky_memory_data_flow_a_vliw_a_epic_a4m36pap.pdf

6. Lecture – Memory subsystem

Non-Blocking cache, Victim cache, Virtual memory and cache.
English: PDF: 06_memory_b4m35pap-en.pdf ODP: 06_memory_b4m35pap-en.odp
Czech: PDF: 06-pamet_uvod-pap.pdf

7. Lecture – Multiprocessor systems and memory coherence problem.

Multiprocessor computers architectures. Distributed and shared memory systems (DMS, SMS). Symmetric multiprocessor computer architectures. Methods to ensure coherence in SMP.
English: PDF: 07-memory_coherence_b4m35pap-en.pdf ODP: 07-memory_coherence_b4m35pap-en.odp
Czech: PDF: 07-pamet_koherence-pap.pdf

Older lecture version: 07_pamet_cast_2_koherence_a4m36pap.pdf

8. Lecture – Multiprocessor systems and memory consistency problems.

Rules for performing memory operations, ensuring sequential consistency, memory consistency models.
English: PDF: 08-memory_consistency-b4m35pap-en.pdf ODP: 08-memory_consistency-b4m35pap-en.odp
Czech: PDF: 08-pamet_konzistence-pap.pdf

Older lecture version: 08_pamet_cast_3_konzistence_a4m36pap.pdf

9. Lecture – Parallel Systems Programming I.

Introduction.
English: PDF: 09_parallelism_b4m35pap-en.pdf ODP: 09_parallelism_b4m35pap-en.odp
Czech: PDF:09_paralelizmus_pap.pdf

10. Lecture – Programming of parallel systems II.

Parallel systems programming concepts, using Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) to create parallel programs.
English: PDF: 10_parallelism_programming_b4m35pap-en.pdf ODP: 10_parallelism_programming_b4m35pap-en.odp
Czech: PDF: 10_paralelni_programovani_a4m36pap.pdf

Synchronization. Code optimization. Cache maintenance, consequences of coherence protocols. Included if time allows.

11. Lecture – Interconnection networks

Static and dynamic interconnection network.
English: PDF: 11_interconnection_networks_b4m35pap-en.pdf ODP: 11_interconnection_networks_b4m35pap-en.odp
Czech: PDF: 11_propojovaci_site_pap.pdf

12. Lecture – Use of graphic accelerators (GPU) and General-purpose computing on graphics processing units (GPGPU)

English: PDF: 13_gpu_and_gpgpu_b4m35pap.pdf ODP: 13_gpu_and_gpgpu_b4m35pap.odp Czech: PDF: 13_gpu_a_gpgpu_pap.pdf

13. Lecture – Time and space parallelization in practice.

Sample of selected partitions on processor Intel Nehalem, AMD Optreon, IBM Power4, ARM, AArch64, RISC-V.

Intel Nehalem, Haswell, AMD Optreon, IBM Power4: 12_nehalem_a4m36pap.pdf

ARM, AArch64, RISC-V:
English: PDF: 12-risc-arch-b4m35pap-en.pdf ODP: 12-risc-arch-b4m35pap-en.odp
Czech: PDF: 12-risc-arch.pdf

14. Lecture – Perspectives and limitations of future development and history

English: PDF: 14_history_and_future_b4m35pap.pdf ODP: 14_history_and_future_b4m35pap.odp
Czech: PDF: 14_historie_a_vyhledy_pap.pdf

Materials to refresh knowledge about I/O subsystem: PCIe, HyperTransport, QuickPathInterconnect 10_io_podsystem.pdf