1. Introduction to the labs, compiler, simulator and data representations

Exercise outline

  1. requirements and evaluations
  2. laboratory and computer system introduction
  3. the tour from Python to C language and assembler
  4. introduction of QtMips
  5. basic data types (integer number) storage in computer memory, program in C
  6. integer representation, addition, subtraction, multiplication, division

What should I repeat before the first exercise

  1. check that your main CTU password work, it is required for GNU/Linuxu longin in KN:E-2 (more)
  2. binary representation and hexadecimal representation of integer numbers
  3. terms little and big endian
  4. two's complement code
  5. logic operations (and, or, invert, rotation, …)
  6. read/repeat knowledge from chapter 3. of APOLOS - Integers expressed in binary system

What shall we do on the first exercise

The seminar start with the classroom introduction, then the basic concepts of computer data representation and their use and connection to languages at different levels of abstraction downto machine code is refreshed. Keep in mind that this is only an overview and an idea of what is covered by the course on next seminaries and for the KyR program students in parallel in the course B3B36PRG - C Programming. For the first half of the semester, the programming language C will be used only for general notation of algorithms (most constructions are understandable as the equivalent of the algorithm in Python for example) and you should gain enough knowledge of C language in frame of B3B36PRG course for real programming work in the second half of semester.

From convenient higher level programming to machine code

Python

It is a dynamically typed higher level language, to run the program it is necessary to use an interpreter (it can also be a runtime environment with a partial translation at runtime) most often written in C.

Test the following simple Python example. To create and edit the source file sum2vars.py use one of the installed editors (geany, vim, emacs, qtcreator, clion, …). For those without personal preference, is is advisable to start with geany.

#!/usr/bin/python3

var_a = 0x1234
var_b = 0x2222

var_c = var_a + var_b

print('sum %d + %d -> %d'%(var_a, var_b, var_c))

print('sum 0x%x + 0x%x -> 0x%x'%(var_a, var_b, var_c))

The program can be passed as a parameter to be run in the interpreter

python3 sum2vars.py

If you mark the file as executable

chmod +x sum2vars.py

it can be run “dirrectly” even though nor processor nor operating system can run such file directly. The first line (shebang) ensures that command line interpreter (shell) does not pass the file directly to execute/spawn by operating system, but calls interpreter specified in the first line and passes the file as parameter to that interpreter.

Minor pitfalls and at the same time a critical habit to increase the safety agiant user's attack (try to figure out why) lies in that the command without path specification is searched only in the directories specified in the path list (environment variable PATH) and that the path does not contain the current directory on reasonably designed and managed systems. Therefore, we have to specify the path to start the program. In this case, the current directory - represented by the dot '.' .

./sum2vars.py

C language

It is a language with strictly defined data types by the programmer. The actual binding of type identifiers to actual data represrentation se;lected for type may differ between architectures because, for example, the signed integer type (int) represents such integer encoding which the range is at least −32,767 to +32,767 and that best suits the target processor for processing. Most range from int −2,147,483,648 to +2,147,483,647 ($ -2^{31}$ to $2^{31}-1$), that is the value is stored in 32-bits, is the choice for of the most of today processor architectures.

It is usually necessary to compile the program before execution (there are other alternatives - CERN ROOT Cling) into a binary form in which it can be loaded by operating system into the memory and processor fulfills operations according to translated/binary machine instructions.

Rewrite of above listed Python program into C language store into file sum2vars.c. Definition of the main() function has to be introduced because this name is reserved for user program entry point by C language standard (ISO/IEC 9899 - latest free). In order to access the function printf (), it is necessary to reference/include header file stdio.h.

#include <stdio.h>

int var_a = 0x1234;
int var_b = 0x2222;

int var_c = 0x3333;

int main()
{
  var_c = var_a + var_b;

  printf("sum %d + %d -> %d\n", var_a, var_b, var_c);

  printf("sum 0x%x + 0x%x -> 0x%x\n", var_a, var_b, var_c);

  return 0;
}

The program can be compiled into binarry form by GNU compiler of C language (manual).

gcc -Wall sum2vars.c

The implicit name of the output executable file is a.out. The program can be run by invocation by its full or lelative path name ./a.out. The desired binary file name can be specified on the command line with the -o switch as well as insertion of debug information is selected by -ggdb. You can also ask the compiler to optimize the program for minimal size by using the -Os switch.

gcc -Wall -Os -ggdb -o sum2vars sum2vars.c

Executable file content in ELF file format can be examined by tool objdump.

objdump -S sum2vars

Translation of the algorithm to individual machine instructions can be explored online as well

Godbolt Compiler Explorer

Assembler - symbolic machine code

The algorithm described by C language source is translated to such a sequence of machine instructions operating on precisely specified data types that external effect/result of the entire program execution or better execution between sequence point is equivalent to the written algorithm. Compiler output is stored in the the form of the selected machine instructions, but without deciding on their final location in the memory. Native translation (translation for the system on which the compiler is currently running) of the source is obtained by command

gcc -Wall -Os -S -o sum2vars.s sum2vars.c

Full understanding of how to write assembler programs for the x86_64 architecture and for running under the GNU / Linux operating system is demanding, so we chose the MIPS architecture to teach processor architectures (reasons) and we will use it in its most limited and simplified form in initial seminaries. In order to observe the internal state and the visualize principle of processor operation, we will not use processor boards with real chips which implement the architecture, but a graphical simulator created specifically for our course needs and goals - QtMips (materials and video (in Czech only for now) from its introduction LinuxDays 2019).

The following examples are presented mainly to provide overview what will be focus of the subject and semster work. We will return to writing the algorithm in machine instructions in the third lecture and seminar in depth with full description.

For the first approximation, we will compile the code with a MIPS cross compiler.

mips-linux-gnu-gcc -ggdb -static -Os -o sum2vars-mips sum2vars.c

The program can be run in our laboratory on GNU/Linux systems with x86_64 architecture, because binary files for MIPS are automatically interpreted by the QEMU emulator in user-space emulation mode.

Manual translation to assembler (file sum2vars.S) may look like this (for simplicity without calling the print function)

.globl main
.globl _start

.text
_start:
main:
	lw   $4, var_a($0)
	lw   $5, var_b($0)
	add  $6, $4, $5
	sw   $6, var_c($0)
	
	addi $2, $0, 0
	jr   $ra
	nop

.data

var_a:	.word 0x1234;
var_b:	.word 0x2222;
var_c:	.word 0x3333;

We can translate the program

mips-elf-gcc -o sum2vars sum2vars.S

load into simulator

and run step by step.

The program can be compiled directly in the simulator. To prevent the simulator from loading an external one program, it should be started in mode without loading ELF file

Select File → New source

The precise/fixed location of variables can be specified this time (directive .org 2000).

_start:
main:
	lw   $4, var_a($0)
	lw   $5, var_b($0)
	add  $6, $4, $5
	sw   $6, var_c($0)
	
	addi $2, $0, 0
	jr   $ra
	nop

.data

.org	0x2000

var_a:	.word 0x1234;
var_b:	.word 0x2222;
var_c:	.word 0x3333;

#pragma qtmips show registers
#pragma qtmips show memory
#pragma qtmips focus memory var_a
#pragma qtmips tab core

Directives for quick location of variables in the data memory view are also added for variable var_a.

Compile by choosing Machine→Compile source

and we can step through the code

Additionally, 0x1234 can be replaced by 0x12345678 and in the dump observe how the value is stored into into individual bytes or 16-bit half words. Use choice World, Half-word and Byte chooser in the data memory view, try changing the view/dump width.

An example of how to try visualize numbers representation from Python, especially for those who do not know even basics of C language

#!/usr/bin/python3

import struct

a = 0x1234567
b = -12345678
c = a + b

buf = struct.pack('<IiI', a, b, c)

print (["{0:02x}".format(b) for b in buf])

(u32_a, u32_b, u8_c0, u8_c1, u8_c2, u8_c3) = struct.unpack('<IIBBBB', buf)
print(u32_a, u32_b, u8_c0, u8_c1, u8_c2, u8_c3)

(s32_a, s32_b, s8_c0, s8_c1, s8_c2, s8_c3) = struct.unpack('<iibbbb', buf)
print(s32_a, s32_b, s8_c0, s8_c1, s8_c2, s8_c3)

Learn more about storing variables in a specified data format representationa in Python module struct — Interpret bytes as packed binary data.

Tasks

  1. With the tutor, review the above overview of what you will be doing during the semester
    • ensure that you can use the QtMips simulator on your lab computers
    • if you are not sure how to install the simulator at home, ask your instructors
  2. Students of OI and the more advanced ones from the KyR program should try to modify the program to sum the two numbers so
    • that it displays result in binary form bit number 31 to bit 0
    • to specify input values ​​on the command line (argc, argv, atoi)
    • to use smaller-size data types (short int, unsigned short int, unsigned char, signed char)
    • try operations with positive and negative numbers, also focus on values ​​that overflow after operation
  3. KyR students without knowledge of C language
    • use the struct module from the above example to analyze data representation.
    • try changing pack to native byte order and big-endian order
    • implement an algorithm for displaying integer variables in binary form
    • try Python provided binary formatting print(“{0: 08b}”.format(a))
  4. Integer addition and subtraction in two's complement representation
    • add and subtract two integer numbers. For example 5+(-6) and 5-(-6).
    • repeat the operations with different numbers and check your results with the computer program from the first exercise.
    • When the underflow and overflow can happen? How can we detect, that it had occured?
  5. Integer multiplication
    • multiply two integers, For example 7*6.
    • is there any difference when multiplying negative numbers? (e.g. -7*6, (-7)*(-6), etc…)
    • show how to speed-up the multiplier? (use many adders instead repetitively using one).
  6. Integer division
    • divide integers 42/7, 43/7
    • does the algorithm change when we use negative numbers?

Homeworks

  • homeworks 1 to 3 will be given and submitted via submission system. https://dcenet.felk.cvut.cz/apo/
  • You have to log in into the submission system and on the “Assignments” page there is a list of opened assignments.
  • If you want to go through the submission procedure, you can use the 1st training homework to try.
  • If there are any problems, please contact your tutor or the author and administrator of the system Richard Šusta.

Computer network in lab KN:E-307

Classroom KN: E-2 is equipped with computers with network installation of Debian GNU/Linux Bullseye

When the computer is turned on, it loads via BIOS PXE network boot option PXElinux boot loader and its configuration. It allows to select

  • boot from the computer local disk (we will not use this option)
  • boot DCE Linux Bullseye (Debian) option to start the network version of the Debian Bullseye operating system (our choice)

Choosing the second option selects to load the GNU/Linux kernel image and the initial RAM disk from the network using the TFTP protocol. When the kernel starts, it mounts root filesystem from the NFS server. However, it is connected in read only mode. To save local changes when computer runs, a file system for temporarily saving local changes in memory and swap file is mapped above basic directory structure. This is the Overlayfs module (AUFS in the past). The Kerberos system is used to verify the credentials and the password is verified against the main CTU identities server. After successful login, the volume with the user account is connected to the station directory structure via the NFS, to which the user has read and write rights.

More information about the solution can be found on the Wiki How to create a diskless machine running GNU/Linux. There are also a DiskLess Debian/GNU Linux slides from our solution presentation at the Install Fest conference/event.

Logins and passwords

The computer access and logins are authenticated against CTU central Kerberos server. The main CTU password is used for computers access.

Remote access to your data

The data in your home directories are available in rooms KN:E-2, KN:E-23, KN:E-24, KN:E-s109 and are also accessible on server postel.felk.cvut.cz via SSH protocol. You can use SCP/SFTP protocol to access data. In Linux OS you can mount your home directories even from your home computer using sshfs utility (for example sshfs jmeno@postel.felk.cvut.cz /mnt/tmp).

sshfs jmeno@postel.felk.cvut.cz: /mnt/tmp
Use nex command to umnout/disconnect account:
fusermount -u /mnt/tmp
The server also offers a remote connection to the graphics applications running on it
ssh -X jmeno@postel.felk.cvut.cz

Remark: The name was chosen not only for convenience access from the comfort of home, but is primarily a reminder of one of the key person of the Internet - Jon Poste.

Troubleshooting, data recovery

In the case of problem with computers or network, please, contact Ing. Ales Kapica or other people from IT 13135 group.

courses/b35apo/en/tutorials/01/start.txt · Last modified: 2021/01/23 09:54 (external edit)