Search
Gain an understanding of how are the machine instructions executed/processed by a simple processor.
The simplicity, straightforwardness, and orthogonality of the instruction encoding is an essential reason why most textbooks choose as a model architecture MIPS processor. The processor reads instructions encoded in binary form. PC-class personal computers do not allow us to execute MIPS-based machine code directly on their processor (architecture X86), but there exist many simulators of this architecture. The simulator QtMips was developed to support the practical experience of this course.
Detailed description:
A more detailed description of the instructions can be found at Wikipedia https://en.wikipedia.org/wiki/MIPS_architecture.
Autoritative description of architecture https://www.mips.com/:
MIPS32 Instruction Set Quick Reference v1.01
The MIPS32 Instruction Set v6.06
There is no restriction to use any plain text editor for writing assembler source code. In the laboratory, there are installed more text editors and development environments. For example vim, Emacs, etc. We suggest to select Geany for those who have no personal preference.
Open a text editor and prepare simple assembler source file named simple-lw-sw.S. The suffix capital “.S” is crucial. This suffix is assigned by standard development tools for the source files/code in assembler language and compiler decides according to the suffix how to process the file. Other recognized suffixes are .c for C language source files. .cc or .cpp for C++, .o for object files (files which content is already translated source code into target machine native instructions but there is not defined to which address will be these code fragments located). Library files suffix .a (archive) is used for functions libraries, which are included in final executable according to their requirements/use by other files. A final executable file (program) is stored without extension on Unix class systems..
simple-lw-sw.S
.S
.c
.cc
.cpp
.o
.a
.globl _start .set noat .set noreorder .text _start: // load the word from absolute address lw $2, 0x2000($0) // store the word to absolute address sw $2, 0x2004($0) loop: // stop execution wait for debugger/user break // ensure that continuation does not // interpret random data beq $0, $0, loop nop .data src_val: .word 0x12345678 dst_val:
Assembly code source file consists of
lw
sw
add
sub
.
The following directives are used in the provided sample code
_start
_ _start
la
lui
ori
.text
.data
The complete manual for GNU assembler and all its directives can be found at GNU assembleru.
The compilation is performed by cross-compiler mips-elf-gcc (cross-compiler means that compiler host system, PC in our case, is different than compilation target system with MIPS architecture).
mips-elf-gcc -Wl,-Ttext,0x1000 -Wl,-Tdata,0x2000 -nostdlib -nodefaultlibs -nostartfiles -o simple-lw-sw simple-lw-sw.S
The invocation requires multiple parameters, because standard programs in C language require linking of initialization sequences and library functions. But our actual goal is to describe the lowest level without this automatization the first which allows understanding how it is extended and equipped by automatic initialization sequences and constructs which allows comfortable code writing of programs at a higher level and in higher level languages. Parameters -Wl, are passed to the linker program and specify to which address is located .text section with program instructions and to where starts .data section with initialized variables. Following parameters in the order of their appearance disable automatic addition of startup and initialization sequence (-nostartfiles) and disable use of standard libraries. The file name after switch -o specifies the name of the final executable file (output). File names of one or more source files follow.
-Wl,
-nostartfiles
-o
/opt/apo/qtmips-semstart
On Windows, complete MSys with make utility can be installed or plain compiler mips-elf-gcc-i686-mingw32 can be called from following batch file.
make
PATH=%PATH%;c:\path\to\mips-elf-gcc-i686-mingw32\bin mips-elf-gcc -Wl,-Ttext,0x1000 -Wl,-Tdata,0x2000 -nostdlib -nodefaultlibs -nostartfiles -o simple-lw-sw test.S
QtMips simulator is used to execute the program. Select the most simple variant of simulated processor “No pipeline no cache”. Button “Browse” is used to select executable file name (simple-lw-sw without suffix in our case) for field “Elf executable”.
simple-lw-sw
Select tab Core next and disable checkbox Delay slot. This configuration makes simulator to diverge from real MIPS architecture, but it is more straightforward for initial experiments. The program is written in the mode set .noreorder is translated 1:1 to the instruction sequence for this setup and branch instruction execution would be more intuitive - they are processed immediately. We return to this topic later with full reasons for real processor behavior.
Core
Delay slot
set .noreorder
The diagram with the processor is opened. Use double-click on program counter register (PC) opens program listing with actual instructions. Double click on registers blocks opens a view with the list of architectural registers. Double click on data memory shows memory content. The windows layout shown on the next picture is the appropriate and intuitive starting point
Select “Follow fetch” option in the “Program” listing window which highlights instruction/line actually fetched by the processor for execution. The start of listing in “Memory” windows should be set to the address 0x2000 on which data value 0x12345678 was placed. The program can be stepped through by “Machine” → “Step” menu entry or by the corresponding button on the toolbar. The value is loaded to the register first, and then it is stored to the following memory cell.
Change the program to process a load-store sequence in the loop. Instruction break has to be removed from the loop because its purpose is to stop program execution when reached. Test that edited value at address 0x2000 is always copied to the address 0x2004. Modify the program to add two input values on 0x2000 and 0x2004 address and stored the result at address 0x2008.
break
Next step is to modify the program to add two vectors with a length of four words. Use assembler macroinstruction la vect_a (load address) to set registers to point to the start of the vectors.
la vect_a
... .data vect_a: .word 0x12345678 .word 0x12345678 .word 0x12345678 .word 0x12345678 vect_b: .word 0x12345678 ...
Continue with implementation of the program to compute an average value from eight numbers.
The compiler invocation is desirable to document at least and better automate. The one way is to use a script with the sequence of commands required for compilation. Such script can be written directly for shell - command line interpreter (BASH or DASH on GNU/Linux usually). But it is not practical to translate all compilation units of a larger project when only small change modifies only one or a small subset of source files. More different systems have been developed to automate exactly these tasks. Some examples are Make, Ant, qmake, Cmake, meson, etc.
Make is the tool which allows to automate compilation of source codes, description of the compilation process is described in Makefile.
Makefile template for the compilation of source files written in assembly language or C language for MIPS simulator environment:
ARCH=mips-elf CC=$(ARCH)-gcc AS=$(ARCH)-as LD=$(ARCH)-ld OBJCOPY=$(ARCH)-objcopy CFLAGS += -ggdb -O1 AFLAGS += -ggdb LDFLAGS += -ggdb LDFLAGS += -nostdlib -nodefaultlibs -nostartfiles LDFLAGS += -Wl,-Ttext,0x1000 -Wl,-Tdata,0x2000 all:default .PHONY:clean %.srec:% $(OBJCOPY) -O srec $< $@ %.o:%.S $(CC) -D__ASSEMBLY__ $(AFLAGS) -c $< -o $@ %.s:%.c $(CC) $(CFLAGS) $(CPPFLAGS) -S $< -o $@ %.o:%.c $(CC) $(CFLAGS) $(CPPFLAGS) -c $< -o $@ # default output default:change_me.srec # executable file:object file.o change_me:change_me.o $(CC) $(LDFLAGS) $^ -o $@ # all generated that would be cleaned clean: rm -f change_me change_me.o change_me.out change_me.srec
Makefile consists from definitions (assignment of values to variables) and rules. The rules start by a line which defines dependency of rule target(s) on the dependencies listed after the colon. The dependencies are names of files or abstracts commands which as to be (make) available before the commands following the first rule line can create required results. File names can be complete names or their base part can be substituted by character “%” which allows specifying rule for a whole class of transformations from one compilation stage to another. Even more complete template for the compilation of assembler and C source files to MIPS target platform with an automatic building of dependencies on header files can be found in directory /opt/apo/qtmips_template on the computers in the laboratory.
/opt/apo/qtmips_template
A compilation is invoked by a command make (the make has to be invoked in the directory where Makefile and program source code are located). Make generates multiple output files. The file without an extension is used for execution in QtMips environment. The process of compilation translated the compilation unit (one source wile together with includes header files for a simple case) into object files (.o) in relocatable form. Object files are then collected by the linker which resolves address references between compilation units and locates code to the final addresses. It is necessary to equip the actual sequences of machine instructions by the envelope which specifies where to fill references during .o files linking during the final placement on the specified addresses. Even instruction and data in the final executable form usually require some information for operating systems where they should be loaded/mapped in the memory or process address-space. The ELF (Executable and Linkable Format) is used to store these metadata in our case and generally on most of the modern systems.
Next command can be used to find addresses of the final location of variables and data entries after linking
mips-elf-nm program
The list of sections and their locations can be listed by
mips-elf-objdump --headers program
List final machine code after translation with corresponding source lines
mips-elf-objdump --source program
nop
vec_a
vec_b
vec_c
vec_c[0] = vec_a[0] + vec_b[0]
gcc -E assembler.S -o preprocessed-pro-mips.s can be used for preprocessing.
gcc -E assembler.S -o preprocessed-pro-mips.s
In many practical applications we have to use median filter. This median filter removes noise (obvious outliers/dead pixels) from a signal or an image. The median filter takes a neighborhood of a sample (10 samples before and 10 after), finds median value and replaces the sample value with this median. Very similar to this filter is mean filter that replaces the sample value with average value of the nearby samples. The median value is usually calculated by sorting the samples by value and picking the sample in the middle. The sorting algorithm is a cornerstone to median filter implementation. Lets assume we have 21 integers stored in array in memory. The array begins in some given address (e.g. 0x00). On integer occupies one word in the memory. The task is to sort the integers in ascending order. To do this we will implement the bubble sort algorithm. In this algorithm two adjacent values are compared and if they are in wrong order, they are swapped. And this comparisons goes repetitively through array until no swaps are done.
The code for bubble sort is bellow:
int pole[5]={5,3,4,1,2}; int main() { int N = 5,i,j,tmp; for(i=0; i<N; i++) for(j=0; j<N-1-i; j++) if(pole[j+1]<pole[j]) { tmp = pole[j+1]; pole[j+1] = pole[j]; pole[j] = tmp; } return 0; }
Transcribe C code above to MIPS assembler. Verify correctness of your implementation in Mips simulator. We will be using this program in the next class. So finish the program at home, if you have not finished it during the class.
Here is a template, you can use:
.globl array .data .align 2 array: .word 5 3 4 1 2 .text .globl start .ent start start: // TODO: Write your code here nop .end start
if (i ==j) f = g + h; f = f – i;
// s0=f, s1=g, s2=h, s3=i, s4=j bne $s3, $s4, L1 // If i!=j, go to label L1 add $s0, $s1, $s2 // if block: f=g+h L1: sub $s0, $s0, $s3 // f = f-i
if (i ==j) f = g + h; else f = f – i;
// s0=f, s1=g, s2=h, s3=i, s4=j bne $s3, $s4, else // If i!=j, go to **else** label add $s0, $s1, $s2 // if block: f=g+h j L2 // jump behind the **else** block else: sub $s0, $s0, $s3 // else block: f = f-i L2:
int pow = 1; int x = 0; while(pow != 128) { pow = pow*2; x = x + 1; }
// s0=pow, s1=x addi $s0, $0, 1 // pow = 1 addi $s1, $0, 0 // x = 0 addi $t0, $0, 128 // t0 = 128 to compare (always have to compare two registers) while: beq $s0, $t0, done // If pow==128, end the cycle. Go to done label. sll $s0, $s0, 1 // pow = pow*2 addi $s1, $s1, 1 // x = x+1 j while done:
int sum = 0; for(int i=0; i!=10; i++) { sum = sum + i; }
//Is equivalent to following while cycle: int sum = 0; int i = 0; while(i!=10){ sum = sum + i; i++; }
// Just as an example... int a, *pa=0x80020040; int b, *pb=0x80020044; int c, *pc=0x00001234; a = *pa; b = *pb; c = *pc;
// s0=pa (Base address), s1=a, s2=b, s3=c lui $s0, 0x8002 // pa = 0x80020000; lw $s1, 0x40($s0) // a = *pa; lw $s2, 0x44($s0) // b = *pb; addi $s0, $0, 0x1234 // pc = 0x00001234; lw $s3, 0x0($s0) // c = *pc;
int array[4] = { 7, 2, 3, 5 }; int main() { int i,tmp; for(i=0; i<4; i++) { tmp = array[i]; tmp += 1; pole[i] = tmp; } return 0; }
.globl array // label "array" is declared as global. It is visible from all files in the project. .data // directive indicating start of the data segment .align 2 // set data alignment to 4 bytes array: // label - name of the memory block .word 7, 2, 3, 5 // values in the array to increment... .text // beginning of the text segment (or code segment) .globl start .ent start start: la $s0, array // store address of the "array" to the register s0 addi $s1, $0, 0 // initialization instruction of for cycle: i=0, kde i=s1 addi $s2, $0, 4 // set the upper bound for cycle for: beq $s1, $s2, done // if s1 == s2, go to label done and break the cycle lw $s3, 0x0($s0) // load value from the array to s3 add $s3, $s3, 0x1 // increment the s3 register sw $s3, 0x0($s0) // replace (store) value from s3 register addi $s0, $s0, 0x4 // increment offset and move to the other value in the array addi $s1, $s1, 0x1 // increment number of passes through the cycle (i++). j for // jump to **for** label done: nop .end start
QtMips simulator includes a few simple peripherals which are mapped into memory address space.
The first is simple serial port (UART) connected to terminal window. The registers locations and bit fields is the same as for simulators SPIM and MARS. These maps serial port from address 0xffff0000. QtMips maps the UART peripheral to this address as well but offers alternative mapping to address 0xffffc000 which can be encoded as absolute address into LW and SW instructions with zero base register.
The next peripherals emulates interaction with simple control elements of a real device. The registers map matches to the subset of registers of dial knobs and LËD indicators peripheral which is available for input and output on a development kits MicroZed APO which are used for your semester work.
A simple program reads position of the simulator knobs dials and converts the read values to the RGB led color and text/terminal output. Program is available from directory /opt/apo/qtmips_binrep on the laboratory computers. There is available archive to download as well qtmips_binrep.tar.gz.
/opt/apo/qtmips_binrep
The C source code has been compiled by the following commands sequence
mips-elf-gcc -D__ASSEMBLY__ -ggdb -fno-lto -c crt0local.S -o crt0local.o mips-elf-gcc -ggdb -Os -Wall -fno-lto -c qtmips_binrep.c -o qtmips_binrep.o mips-elf-gcc -ggdb -nostartfiles -static -fno-lto crt0local.o qtmips_binrep.o -o qtmips_binrep
The content of the program compiled into ELF executable format is examined by objdump command
mips-elf-objdump --source -M no-aliases,reg-names=numeric qtmips_binrep
There is output with detailed commentaries included.
qtmips_binrep: file format elf32-bigmips Disassembly of section .text: 00400018 <main>: /* * The main entry into example program */ int main(int argc, char *argv[]) { 400018: 27bdffe8 addiu $29,$29,-24 allocate space on the stack for main() function stack frame 40001c: afbf0014 sw $31,20($29) save previous value of the return address register to the stack. while (1) { uint32_t rgb_knobs_value; unsigned int uint_val; rgb_knobs_value = *(volatile uint32_t*)(mem_base + SPILED_REG_KNOBS_8BIT_o); 400020: 8c04c124 lw $4,-16092($0) Read value from the address corresponding to the sum of "SPILED_REG_BASE" and "SPILED_REG_KNOBS_8BIT_o" peripheral register offset LW is instruction to load the word. Address is formed from the sum of register $0 (fixed zero) and -16092, which is represented in hexadecimal as 0xffffc124 i.e., sum of 0xffffc100 and 0x24. The read value is stored in register $4. 400024: 00000000 sll $0,$0,0x0 one NOP instruction to ensure that load finishes before the further value use. 400028: 00041027 nor $2,$0,$4 Compute bit complement "~" of the value in the register $4 and store it into register $2 *(volatile uint32_t*)(mem_base + SPILED_REG_LED_LINE_o) = rgb_knobs_value; 40002c: ac04c104 sw $4,-16124($0) Store RGB knobs values from register $4to the "LED" line register which is shown in binary decimal and hexadecimal on the QtMips target. Address 0xffffc104 *(volatile uint32_t*)(mem_base + SPILED_REG_LED_RGB1_o) = rgb_knobs_value; 400030: ac04c110 sw $4,-16112($0) Store RGB knobs values to the corresponding components controlling a color/brightness of the RGB LED 1 Address 0xffffc110 *(volatile uint32_t*)(mem_base + SPILED_REG_LED_RGB2_o) = ~rgb_knobs_value; 400034: ac02c114 sw $2,-16108($0) Store complement of RGB knobs values to the corresponding components controlling a color/brightness of the RGB LED 2 Address 0xffffc114 /* Assign value read from knobs to the basic signed and unsigned types */ uint_val = rgb_knobs_value; the read value resides in the register 4, which correspond to the first argument register a0 /* Print values */ serp_send_hex(uint_val); 400038: 0c100028 jal 4000a0 <serp_send_hex> 40003c: 00000000 sll $0,$0,0x0 call the function to send hexadecimal value to the serial port, one instruction after JAL is executed in its delay-slot, PC pointing after this instruction (0x400040) is stored to the register 31, return address register serp_tx_byte('\n'); 400040: 0c100020 jal 400080 <serp_tx_byte> 400044: 2404000a addiu $4,$0,10 call routine to send new line character to the serial port. The ASCII value corresponding to '\n' is set to argument a0 register in delay slot of JAL. JAL is decoded and in parallel instruction addiu $4,$0,10 is executed then PC pointing to the address 0x400048 after delay slot is stored to return address register and next instruction is fetch from the JAL instruction target address, start of the function serp_tx_byte 400048: 1000fff5 beqz $0,400020 <main+0x8> 40004c: 00000000 sll $0,$0,0x0 branch back to the start of the loop reading value from the knobs 00400050 <_start>: la $gp, _gp 400050: 3c1c0041 lui $28,0x41 400054: 279c90e0 addiu $28,$28,-28448 Load global data base pointer to the global data base register 28 - gp. Symbol _gp is provided by linker. addi $a0, $zero, 0 400058: 20040000 addi $4,$0,0 Set regist a0 (the first main function argument) to zero, argc is equal to zero. addi $a1, $zero, 0 40005c: 20050000 addi $5,$0,0 Set regist a1 (the second main function argument) to zero, argv is equal to NULL. jal main 400060: 0c100006 jal 400018 <main> nop 400064: 00000000 sll $0,$0,0x0 Call the main function. Return address is stored in the ra ($31) register. 00400068 <quit>: quit: addi $a0, $zero, 0 400068: 20040000 addi $4,$0,0 If the main functio returns, set exit value to 0 addi $v0, $zero, 4001 /* SYS_exit */ 40006c: 20020fa1 addi $2,$0,4001 Set system call number to code representing exit() syscall 400070: 0000000c syscall Call the system. 00400074 <loop>: loop: break 400074: 0000000d break If there is not a system try to stop the execution by invoking debugging exception beq $zero, $zero, loop 400078: 1000fffe beqz $0,400074 <loop> nop 40007c: 00000000 sll $0,$0,0x0 If even this does not stop execution, command CPU to spin in busy loop. void serp_tx_byte(int data) { 00400080 <serp_tx_byte>: while (!(serp_read_reg(SERIAL_PORT_BASE, SERP_TX_ST_REG_o) & SERP_TX_ST_REG_READY_m)); 400080: 8c02c008 lw $2,-16376($0) 400084: 00000000 sll $0,$0,0x0 Read serial port transmit status register, address 0xffffc008 while (!(serp_read_reg(SERIAL_PORT_BASE, SERP_TX_ST_REG_o) & 400088: 30420001 andi $2,$2,0x1 40008c: 1040fffc beqz $2,400080 <serp_tx_byte> 400090: 00000000 sll $0,$0,0x0 Wait again till UART is ready to accept character - bit 0 is not zero. NOP in the delayslot. *(volatile uint32_t *)(base + reg) = val; 400094: ac04c00c sw $4,-16372($0) write value from register 4 (the first argument a0) to the address 0xffffc00c (SERP_TX_DATA_REG_o) serial port tx data register. } 400098: 03e00008 jr $31 40009c: 00000000 sll $0,$0,0x0 jump/return back to continue in callee program address of the next fetch instruction is read from the return address register 32 ra void serp_send_hex(unsigned int val) { 004000a0 <serp_send_hex>: 4000a0: 27bdffe8 addiu $29,$29,-24 allocate space on the stack for the routine stack frame 4000a4: 00802825 or $5,$4,$0 copy value of the fisrt argument regsiter 4 (a0) to the register 5 for (i = 8; i > 0; i--) { 4000a8: 24030008 addiu $3,$0,8 set the value of the register 3 to the 8 4000ac: afbf0014 sw $31,20($29) save previous value of the return address register to the stack. char c = (val >> 28) & 0xf; 4000b0: 00051702 srl $2,$5,0x1c shift value in register 5 right by 28 bits and store result in the register 2 4000b4: 304600ff andi $6,$2,0xff abundant operation to limit value range to the character type variable and store result in the register 6 if (c < 10 ) 4000b8: 2c42000a sltiu $2,$2,10 set register 2 to one if the value is smaller than 10 c += 'A' - 10; 4000bc: 10400002 beqz $2,4000c8 <serp_send_hex+0x28> 4000c0: 24c40037 addiu $4,$6,55 if value is larger or equal (register 2 is 0/false) then add value 55 ('A' - 10)..(0x41 - 0xa) = 0x37 = 55 to the register 6 and store result in the register 4. This operation is executed even when the branch arm before else is executed, but result is immediately overwritten by next instruction c += '0'; 4000c4: 24c40030 addiu $4,$6,48 add value 0x30 = 48 = '0' to the value in the register 6 and store result in the register 4 - the fisrt argument a0 serp_tx_byte(c); 4000c8: 0c100020 jal 400080 <serp_tx_byte> 4000cc: 2463ffff addiu $3,$3,-1 call subroutine to send byte to the serial port decrement loop control variable (i) in delay-slot for (i = 8; i > 0; i--) { 4000d0: 1460fff7 bnez $3,4000b0 <serp_send_hex+0x10> 4000d4: 00052900 sll $5,$5,0x4 the final condition of for loop converted to do {} while() loop. If not all 8 character send loop again. Shift left value in the register 5 by 4 bit positions. The compiler does not store values of local variables to the stack even does not store values in caller save registers (which requires to save previous values to the function stack frame). Compiler can use this optimization because it knows registers usage of called function serp_tx_byte(). } 4000d8: 8fbf0014 lw $31,20($29) 4000dc: 00000000 sll $0,$0,0x0 restore return address register value to that found at function start 4000e0: 03e00008 jr $31 4000e4: 27bd0018 addiu $29,$29,24 return to the caller function. Instruction in jump register delay-slot is used to restore stack pointer/free function frame.
-nostdlib -nodefaultlibs -nostartfiles -Wl,-Ttext,0x80020000
-lm -lgcc -lc
gcc -E assembler.S -o preprocessed_assembler.s