1. Introduction to the labs, compiler, simulator and data representations

Exercise outline

  1. requirements and evaluations
  2. laboratory and computer system introduction
  3. the tour from Python to C language and assembler
  4. introduction of QtRvSim
  5. basic data types (integer number) storage in computer memory, program in C
  6. integer representation, addition, subtraction, multiplication, division

What should I repeat before the first exercise

  1. check that your main CTU password work, it is required for GNU/Linuxu longin in KN:E-23 (more)
  2. binary representation and hexadecimal representation of integer numbers
  3. terms little and big endian
  4. two's complement code
  5. addition, subtraction, multiplication, division
  6. logic operations (and, or, invert, rotation, …)
  7. read/repeat knowledge from chapter 3. of APOLOS - Integers expressed in binary system

What shall we do on the first exercise

The seminar start with the classroom introduction, then the basic concepts of computer data representation and their use and connection to languages at different levels of abstraction downto machine code is refreshed. Keep in mind that this is only an overview and an idea of what is covered by the course on next seminaries and for the KyR program students in parallel in the course B3B36PRG - C Programming. For the first half of the semester, the programming language C will be used only for general notation of algorithms (most constructions are understandable as the equivalent of the algorithm in Python for example) and you should gain enough knowledge of C language in frame of B3B36PRG course for real programming work in the second half of semester.

From convenient higher level programming to machine code

Python

It is a dynamically typed higher level language, to run the program it is necessary to use an interpreter (it can also be a runtime environment with a partial translation at runtime) most often written in C.

Test the following simple Python example. To create and edit the source file sum2vars.py use one of the installed editors (geany, vim, emacs, qtcreator, clion, …). For those without personal preference, is is advisable to start with geany.

#!/usr/bin/python3
 
var_a = 0x1234
var_b = 0x2222
 
var_c = var_a + var_b
 
print('sum %d + %d -> %d'%(var_a, var_b, var_c))
 
print('sum 0x%x + 0x%x -> 0x%x'%(var_a, var_b, var_c))

The program can be passed as a parameter to be run in the interpreter

python3 sum2vars.py

If you mark the file as executable

chmod +x sum2vars.py

it can be run “dirrectly” even though nor processor nor operating system can run such file directly. The first line (shebang) ensures that command line interpreter (shell) does not pass the file directly to execute/spawn by operating system, but calls interpreter specified in the first line and passes the file as parameter to that interpreter.

Minor pitfalls and at the same time a critical habit to increase the safety agiant user's attack (try to figure out why) lies in that the command without path specification is searched only in the directories specified in the path list (environment variable PATH) and that the path does not contain the current directory on reasonably designed and managed systems. Therefore, we have to specify the path to start the program. In this case, the current directory - represented by the dot '.' .

./sum2vars.py

C language

It is a language with strictly defined data types by the programmer. The actual binding of type identifiers to actual data represrentation se;lected for type may differ between architectures because, for example, the signed integer type (int) represents such integer encoding which the range is at least −32,767 to +32,767 and that best suits the target processor for processing. Most range from int −2,147,483,648 to +2,147,483,647 ($ -2^{31}$ to $2^{31}-1$), that is the value is stored in 32-bits, is the choice for of the most of today processor architectures.

It is usually necessary to compile the program before execution (there are other alternatives - CERN ROOT Cling) into a binary form in which it can be loaded by operating system into the memory and processor fulfills operations according to translated/binary machine instructions.

hello-apo.c

“Hello world” program demonstrates how the C program is compiled for different processor architectures. Let store the program source in the file named hello-apo.c.

#include <stdio.h>
 
int main(int argc, char *argv[])
{
  printf("Hello APO\n");
 
  return 0;
}

Program can be compiled and run on current computer

gcc -Wall -o hello-apo hello-apo.c
./hello-apo

The example with the build recipe (Makefile) included can be found in the directory /opt/apo/hello-apo on the computers in the computer laboratory. The Makefile provides rules for program building for multiple processor architectures. Outside of the laboratory, access to the examples used during seminaries is enabled through the web interface FEL instance of GitLab version control system (directory seminaries/hello-apo of repository https://gitlab.fel.cvut.cz/b35apo/stud-support/).

Example to build (compile) and run the example on the computers in the laboratory

mkdir -p ~/apo
cd ~/apo
cp -r -L /opt/apo/hello-apo .
cd hello-apo
make ARCH=native
make ARCH=native run

Example is prepared in such way, that it can be build and run for different processor architectures, next choices for ARCH parametr are supported native, x86, riscv, riscv64, mips, arm or aarch64.

sum2vars.c

Rewrite of above listed sum2vars.py Python program into C language store into file sum2vars.c. Definition of the main() function has to be introduced because this name is reserved for user program entry point by C language standard (ISO/IEC 9899 - latest free). In order to access the function printf (), it is necessary to reference/include header file stdio.h.

#include <stdio.h>
 
int var_a = 0x1234;
int var_b = 0x2222;
 
int var_c = 0x3333;
 
int main()
{
  var_c = var_a + var_b;
 
  printf("sum %d + %d -> %d\n", var_a, var_b, var_c);
 
  printf("sum 0x%x + 0x%x -> 0x%x\n", var_a, var_b, var_c);
 
  return 0;
}

The program can be compiled into binarry form by GNU compiler of C language (manual).

gcc -Wall sum2vars.c

The implicit name of the output executable file is a.out. The program can be run by invocation by its full or lelative path name ./a.out. The desired binary file name can be specified on the command line with the -o switch as well as insertion of debug information is selected by -ggdb. You can also ask the compiler to optimize the program for minimal size by using the -Os switch.

gcc -Wall -Os -ggdb -o sum2vars sum2vars.c

Executable file content in ELF file format can be examined by tool objdump.

objdump -S sum2vars

Translation of the algorithm to individual machine instructions can be explored online as well

Godbolt Compiler Explorer

Assembler - symbolic machine code

The algorithm described by C language source is translated to such a sequence of machine instructions operating on precisely specified data types that external effect/result of the entire program execution or better execution between sequence point is equivalent to the written algorithm. Compiler output is stored in the the form of the selected machine instructions, but without deciding on their final location in the memory. Native translation (translation for the system on which the compiler is currently running) of the source is obtained by command

gcc -Wall -Os -S -o sum2vars.s sum2vars.c

Full understanding of how to write assembler programs for the x86_64 architecture and for running under the GNU / Linux operating system is demanding. The MIPS architecture due to its origin at academic and use in the well now textbooks has been the first choice for the teaching in past (reasons). But we have switched to more open and perspective RISC-V architecture now. We will use it in its most limited and simplified form in initial seminaries. In order to observe the internal state and the visualize principle of processor operation, we will not use processor boards with real chips which implement the architecture, but a graphical simulator created specifically for our course needs and goals - QrTvSim (see Education from Assembly to Pipeline, Cache Performance, and C Level Programming functionality overview presented at FOSDEM 2023). The main contributor behind switch of the simulator from MIPS to RISC-V is Jakub Dupak (see his bachelor thesis describing internals of the simulator ).

The following examples are presented mainly to provide overview what will be focus of the subject and semster work. We will return to writing the algorithm in machine instructions in the third lecture and seminar in depth with full description.

For the first approximation, we will compile the code with a RISC-V cross compiler.

riscv64-linux-gnu-gcc -ggdb -Os -Wall -static -fno-lto -o sum2vars-riscv sum2vars.c

The program can be run in our laboratory on GNU/Linux systems with x86_64 architecture, because binary files for MIPS, RISC-V, ARM and some other architectures are automatically interpreted by the QEMU emulator in user-space emulation mode.

The program can be compiled even for “bare metal” mode which does not need operating system services (printf function calling has to be commented out in this case)

//#include <stdio.h>
 
int var_a = 0x1234;
int var_b = 0x2222;
 
int var_c = 0x3333;
 
int main()
{
  var_c = var_a + var_b;
 
  //printf("sum %d + %d -> %d\n", var_a, var_b, var_c);
 
  //printf("sum 0x%x + 0x%x -> 0x%x\n", var_a, var_b, var_c);
 
  return 0;
}

We can translate the program

riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -c sum2vars.c -o sum2vars.o

Because object file provides only main() function it is not enough to start it directly in simulator. It is necessary to setup some initial processor state. The file start.S can be used for such setup and call of the man function

.globl   _start
.text
.option norelax
 
_start:
     la x2, _end+0x4000
     la x3, __global_pointer$
     jal  main
     ebreak

Modul start.S is compiled to start.o

riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -c start.S -o start.o

The final executable is obtained by combining/linking of these two intermediate object files

riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib sum2vars.o start.o -o sum2vars-riscv -lgcc

The program can be load into simulator now

and run step by step.

The program can be simplified even further. Manual rewrite to RISC-V assembler (file sum2vars-riscv.S) is shown below (it is without calling printf() function for simplicity)

.globl _start
 
.text
_start:
	lw   x4, var_a(x0)
	lw   x5, var_b(x0)
	add  x6, x4, x5
	sw   x6, var_c(x0)
 
	addi x2, x0, 0
	jr   ra
 
.data
.org 0x400
var_a:	.word 0x1234
var_b:	.word 0x2222
var_c:	.word 0x3333
 
#pragma qtrvsim show registers
#pragma qtrvsim show memory
#pragma qtrvsim focus memory var_a
#pragma qtrvsim tab core

This simplified code can be compiled directly by assembler compiler included in QtRvSim simulator. To prevent the simulator from loading an external one program, it should be started in mode without loading ELF file

Select File → New source

The precise/fixed location of variables can be specified this time (directive .org 0x400).

Directives for quick location of variables in the data memory view are also added for variable var_a.

Compile by choosing Machine→Compile source

and we can step through the code

Additionally, 0x1234 can be replaced by 0x12345678 and in the dump observe how the value is stored into into individual bytes or 16-bit half words. Use choice World, Half-word and Byte chooser in the data memory view, try changing the view/dump width.

An example of how to try visualize numbers representation from Python, especially for those who do not know even basics of C language

#!/usr/bin/python3
 
import struct
 
a = 0x1234567
b = -12345678
c = a + b
 
buf = struct.pack('<IiI', a, b, c)
 
print (["{0:02x}".format(b) for b in buf])
 
(u32_a, u32_b, u8_c0, u8_c1, u8_c2, u8_c3) = struct.unpack('<IIBBBB', buf)
print(u32_a, u32_b, u8_c0, u8_c1, u8_c2, u8_c3)
 
(s32_a, s32_b, s8_c0, s8_c1, s8_c2, s8_c3) = struct.unpack('<iibbbb', buf)
print(s32_a, s32_b, s8_c0, s8_c1, s8_c2, s8_c3)

Learn more about storing variables in a specified data format representationa in Python module struct — Interpret bytes as packed binary data.

Tasks

  1. With the tutor, review the above overview of what you will be doing during the semester
    • ensure that you can use the QtMips simulator on your lab computers
    • if you are not sure how to install the simulator at home, ask your instructors
  2. Students of OI and the more advanced ones from the KyR program should try to modify the program to sum the two numbers so
    • that it displays result in binary form bit number 31 to bit 0
    • to specify input values ​​on the command line (argc, argv, atoi)
    • to use smaller-size data types (short int, unsigned short int, unsigned char, signed char)
    • try operations with positive and negative numbers, also focus on values ​​that overflow after operation
  3. KyR students without knowledge of C language
    • use the struct module from the above example to analyze data representation.
    • try changing pack to native byte order and big-endian order
    • implement an algorithm for displaying integer variables in binary form
    • try Python provided binary formatting print(“{0: 08b}”.format(a))
  4. Integer addition and subtraction in two's complement representation
    • add and subtract two integer numbers. For example 5+(-6) and 5-(-6).
    • repeat the operations with different numbers and check your results with the computer program from the first exercise.
    • When the underflow and overflow can happen? How can we detect, that it had occured?
  5. Integer multiplication
    • multiply two integers, For example 7*6.
    • is there any difference when multiplying negative numbers? (e.g. -7*6, (-7)*(-6), etc…)
    • show how to speed-up the multiplier? (use many adders instead repetitively using one).
  6. Integer division
    • divide integers 42/7, 43/7
    • does the algorithm change when we use negative numbers?

Homeworks

  • homeworks 1 to 3 will be given and submitted via submission system. https://dcenet.fel.cvut.cz/apo/
  • You have to log in into the submission system and on the “Assignments” page there is a list of opened assignments.
  • If you want to go through the submission procedure, you can use the 1st training homework to try.
  • If there are any problems, please contact your tutor or the author and administrator of the system Richard Šusta.

Computer network in lab KN:E-23

Classroom KN:E-23 is equipped with computers with network installation of Debian GNU/Linux Bullseye

When the computer is turned on, it loads via BIOS UEFI network boot option Grub boot loader and its configuration. It allows to select

  • MS Win 10 localboot (we will not use this option)
  • Linux Bookworm option to start the network version of the Debian Bookworm operating system (our choice)

Choosing the second option selects to load the GNU/Linux kernel image and the initial RAM disk from the network using the TFTP protocol. When the kernel starts, it mounts root filesystem from the NFS server. However, it is connected in read only mode. To save local changes when computer runs, a file system for temporarily saving local changes in memory and swap file is mapped above basic directory structure. This is the overlayfs module (AUFS in the past). The Kerberos system is used to verify the credentials and the password is verified against the main CTU identities server. After successful login, the volume with the user account is connected to the station directory structure via the NFS, to which the user has read and write rights.

More information about the solution can be found on the Wiki How to create a diskless machine running GNU/Linux. There are also a DiskLess Debian/GNU Linux slides from our solution presentation at the Install Fest conference/event.

Logins and passwords

The computer access and logins are authenticated against CTU central Kerberos server. The main CTU password is used for computers access.

Remote access to your data

The data in your home directories are available in rooms KN:E-2, KN:E-23, KN:E-24, KN:E-s109 and are also accessible on server postel.felk.cvut.cz via SSH protocol. You can use SCP/SFTP protocol to access data. In Linux OS you can mount your home directories even from your home computer using sshfs utility (for example sshfs jmeno@postel.felk.cvut.cz /mnt/tmp).

sshfs jmeno@postel.felk.cvut.cz: /mnt/tmp
Use nex command to umnout/disconnect account:
fusermount -u /mnt/tmp
The server also offers a remote connection to the graphics applications running on it
ssh -X jmeno@postel.felk.cvut.cz

Remark: The name was chosen not only for convenience access from the comfort of home, but is primarily a reminder of one of the key person of the Internet - Jon Postel.

Troubleshooting, data recovery

In the case of problem with computers or network, please, contact Ing. Ales Kapica or other people from IT 13135 group.

courses/b35apo/en/tutorials/01/start.txt · Last modified: 2024/02/02 18:41 (external edit)