Search
The seminar start with the classroom introduction, then the basic concepts of computer data representation and their use and connection to languages at different levels of abstraction downto machine code is refreshed. Keep in mind that this is only an overview and an idea of what is covered by the course on next seminaries and for the KyR program students in parallel in the course B3B36PRG - C Programming. For the first half of the semester, the programming language C will be used only for general notation of algorithms (most constructions are understandable as the equivalent of the algorithm in Python for example) and you should gain enough knowledge of C language in frame of B3B36PRG course for real programming work in the second half of semester.
It is a dynamically typed higher level language, to run the program it is necessary to use an interpreter (it can also be a runtime environment with a partial translation at runtime) most often written in C.
Test the following simple Python example. To create and edit the source file sum2vars.py use one of the installed editors (geany, vim, emacs, qtcreator, clion, …). For those without personal preference, is is advisable to start with geany.
sum2vars.py
#!/usr/bin/python3 var_a = 0x1234 var_b = 0x2222 var_c = var_a + var_b print('sum %d + %d -> %d'%(var_a, var_b, var_c)) print('sum 0x%x + 0x%x -> 0x%x'%(var_a, var_b, var_c))
The program can be passed as a parameter to be run in the interpreter
python3 sum2vars.py
If you mark the file as executable
chmod +x sum2vars.py
it can be run “dirrectly” even though nor processor nor operating system can run such file directly. The first line (shebang) ensures that command line interpreter (shell) does not pass the file directly to execute/spawn by operating system, but calls interpreter specified in the first line and passes the file as parameter to that interpreter.
Minor pitfalls and at the same time a critical habit to increase the safety agiant user's attack (try to figure out why) lies in that the command without path specification is searched only in the directories specified in the path list (environment variable PATH) and that the path does not contain the current directory on reasonably designed and managed systems. Therefore, we have to specify the path to start the program. In this case, the current directory - represented by the dot '.' .
'.'
./sum2vars.py
It is a language with strictly defined data types by the programmer. The actual binding of type identifiers to actual data represrentation se;lected for type may differ between architectures because, for example, the signed integer type (int) represents such integer encoding which the range is at least −32,767 to +32,767 and that best suits the target processor for processing. Most range from int −2,147,483,648 to +2,147,483,647 ($ -2^{31}$ to $2^{31}-1$), that is the value is stored in 32-bits, is the choice for of the most of today processor architectures.
int
It is usually necessary to compile the program before execution (there are other alternatives - CERN ROOT Cling) into a binary form in which it can be loaded by operating system into the memory and processor fulfills operations according to translated/binary machine instructions.
“Hello world” program demonstrates how the C program is compiled for different processor architectures. Let store the program source in the file named hello-apo.c.
hello-apo.c
#include <stdio.h> int main(int argc, char *argv[]) { printf("Hello APO\n"); return 0; }
Program can be compiled and run on current computer
gcc -Wall -o hello-apo hello-apo.c ./hello-apo
The example with the build recipe (Makefile) included can be found in the directory /opt/apo/hello-apo on the computers in the computer laboratory. The Makefile provides rules for program building for multiple processor architectures. Outside of the laboratory, access to the examples used during seminaries is enabled through the web interface FEL instance of GitLab version control system (directory seminaries/hello-apo of repository https://gitlab.fel.cvut.cz/b35apo/stud-support/).
Makefile
/opt/apo/hello-apo
Example to build (compile) and run the example on the computers in the laboratory
mkdir -p ~/apo cd ~/apo cp -r -L /opt/apo/hello-apo . cd hello-apo make ARCH=native make ARCH=native run
Example is prepared in such way, that it can be build and run for different processor architectures, next choices for ARCH parametr are supported native, x86, riscv, riscv64, mips, arm or aarch64.
native
x86
riscv
riscv64
mips
arm
aarch64
Rewrite of above listed sum2vars.py Python program into C language store into file sum2vars.c. Definition of the main() function has to be introduced because this name is reserved for user program entry point by C language standard (ISO/IEC 9899 - latest free). In order to access the function printf (), it is necessary to reference/include header file stdio.h.
sum2vars.c
main()
printf ()
stdio.h
#include <stdio.h> int var_a = 0x1234; int var_b = 0x2222; int var_c = 0x3333; int main() { var_c = var_a + var_b; printf("sum %d + %d -> %d\n", var_a, var_b, var_c); printf("sum 0x%x + 0x%x -> 0x%x\n", var_a, var_b, var_c); return 0; }
The program can be compiled into binarry form by GNU compiler of C language (manual).
gcc -Wall sum2vars.c
The implicit name of the output executable file is a.out. The program can be run by invocation by its full or lelative path name ./a.out. The desired binary file name can be specified on the command line with the -o switch as well as insertion of debug information is selected by -ggdb. You can also ask the compiler to optimize the program for minimal size by using the -Os switch.
a.out
./a.out
-o
-ggdb
-Os
gcc -Wall -Os -ggdb -o sum2vars sum2vars.c
Executable file content in ELF file format can be examined by tool objdump.
objdump -S sum2vars
Translation of the algorithm to individual machine instructions can be explored online as well
Godbolt Compiler Explorer
The algorithm described by C language source is translated to such a sequence of machine instructions operating on precisely specified data types that external effect/result of the entire program execution or better execution between sequence point is equivalent to the written algorithm. Compiler output is stored in the the form of the selected machine instructions, but without deciding on their final location in the memory. Native translation (translation for the system on which the compiler is currently running) of the source is obtained by command
gcc -Wall -Os -S -o sum2vars.s sum2vars.c
Full understanding of how to write assembler programs for the x86_64 architecture and for running under the GNU / Linux operating system is demanding. The MIPS architecture due to its origin at academic and use in the well now textbooks has been the first choice for the teaching in past (reasons). But we have switched to more open and perspective RISC-V architecture now. We will use it in its most limited and simplified form in initial seminaries. In order to observe the internal state and the visualize principle of processor operation, we will not use processor boards with real chips which implement the architecture, but a graphical simulator created specifically for our course needs and goals - QrTvSim (see Education from Assembly to Pipeline, Cache Performance, and C Level Programming functionality overview presented at FOSDEM 2023). The main contributor behind switch of the simulator from MIPS to RISC-V is Jakub Dupak (see his bachelor thesis describing internals of the simulator ).
The following examples are presented mainly to provide overview what will be focus of the subject and semster work. We will return to writing the algorithm in machine instructions in the third lecture and seminar in depth with full description.
For the first approximation, we will compile the code with a RISC-V cross compiler.
riscv64-linux-gnu-gcc -ggdb -Os -Wall -static -fno-lto -o sum2vars-riscv sum2vars.c
The program can be run in our laboratory on GNU/Linux systems with x86_64 architecture, because binary files for MIPS, RISC-V, ARM and some other architectures are automatically interpreted by the QEMU emulator in user-space emulation mode.
The program can be compiled even for “bare metal” mode which does not need operating system services (printf function calling has to be commented out in this case)
//#include <stdio.h> int var_a = 0x1234; int var_b = 0x2222; int var_c = 0x3333; int main() { var_c = var_a + var_b; //printf("sum %d + %d -> %d\n", var_a, var_b, var_c); //printf("sum 0x%x + 0x%x -> 0x%x\n", var_a, var_b, var_c); return 0; }
We can translate the program
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -c sum2vars.c -o sum2vars.o
Because object file provides only main() function it is not enough to start it directly in simulator. It is necessary to setup some initial processor state. The file start.S can be used for such setup and call of the man function
.globl _start .text .option norelax _start: la x2, _end+0x4000 la x3, __global_pointer$ jal main ebreak
Modul start.S is compiled to start.o
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -c start.S -o start.o
The final executable is obtained by combining/linking of these two intermediate object files
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib sum2vars.o start.o -o sum2vars-riscv -lgcc
The program can be load into simulator now
and run step by step.
The program can be simplified even further. Manual rewrite to RISC-V assembler (file sum2vars-riscv.S) is shown below (it is without calling printf() function for simplicity)
sum2vars-riscv.S
.globl _start .text _start: lw x4, var_a(x0) lw x5, var_b(x0) add x6, x4, x5 sw x6, var_c(x0) addi x2, x0, 0 jr ra .data .org 0x400 var_a: .word 0x1234 var_b: .word 0x2222 var_c: .word 0x3333 #pragma qtrvsim show registers #pragma qtrvsim show memory #pragma qtrvsim focus memory var_a #pragma qtrvsim tab core
This simplified code can be compiled directly by assembler compiler included in QtRvSim simulator. To prevent the simulator from loading an external one program, it should be started in mode without loading ELF file
Select File → New source
The precise/fixed location of variables can be specified this time (directive .org 0x400).
.org 0x400
Directives for quick location of variables in the data memory view are also added for variable var_a.
var_a
Compile by choosing Machine→Compile source
and we can step through the code
Additionally, 0x1234 can be replaced by 0x12345678 and in the dump observe how the value is stored into into individual bytes or 16-bit half words. Use choice World, Half-word and Byte chooser in the data memory view, try changing the view/dump width.
An example of how to try visualize numbers representation from Python, especially for those who do not know even basics of C language
#!/usr/bin/python3 import struct a = 0x1234567 b = -12345678 c = a + b buf = struct.pack('<IiI', a, b, c) print (["{0:02x}".format(b) for b in buf]) (u32_a, u32_b, u8_c0, u8_c1, u8_c2, u8_c3) = struct.unpack('<IIBBBB', buf) print(u32_a, u32_b, u8_c0, u8_c1, u8_c2, u8_c3) (s32_a, s32_b, s8_c0, s8_c1, s8_c2, s8_c3) = struct.unpack('<iibbbb', buf) print(s32_a, s32_b, s8_c0, s8_c1, s8_c2, s8_c3)
Learn more about storing variables in a specified data format representationa in Python module struct — Interpret bytes as packed binary data.
struct
Classroom KN:E-23 is equipped with computers with network installation of Debian GNU/Linux Bullseye
When the computer is turned on, it loads via BIOS UEFI network boot option Grub boot loader and its configuration. It allows to select
Linux Bookworm
Choosing the second option selects to load the GNU/Linux kernel image and the initial RAM disk from the network using the TFTP protocol. When the kernel starts, it mounts root filesystem from the NFS server. However, it is connected in read only mode. To save local changes when computer runs, a file system for temporarily saving local changes in memory and swap file is mapped above basic directory structure. This is the overlayfs module (AUFS in the past). The Kerberos system is used to verify the credentials and the password is verified against the main CTU identities server. After successful login, the volume with the user account is connected to the station directory structure via the NFS, to which the user has read and write rights.
More information about the solution can be found on the Wiki How to create a diskless machine running GNU/Linux. There are also a DiskLess Debian/GNU Linux slides from our solution presentation at the Install Fest conference/event.
The computer access and logins are authenticated against CTU central Kerberos server. The main CTU password is used for computers access.
The data in your home directories are available in rooms KN:E-2, KN:E-23, KN:E-24, KN:E-s109 and are also accessible on server postel.felk.cvut.cz via SSH protocol. You can use SCP/SFTP protocol to access data. In Linux OS you can mount your home directories even from your home computer using sshfs utility (for example sshfs jmeno@postel.felk.cvut.cz /mnt/tmp).
postel.felk.cvut.cz
sshfs jmeno@postel.felk.cvut.cz: /mnt/tmp
fusermount -u /mnt/tmp
ssh -X jmeno@postel.felk.cvut.cz
Remark: The name was chosen not only for convenience access from the comfort of home, but is primarily a reminder of one of the key person of the Internet - Jon Postel.
In the case of problem with computers or network, please, contact Ing. Ales Kapica or other people from IT 13135 group.