3. Processor instruction set and translation of algorithm

Exercise outline

  1. Basic instructions of the processor and their description
  2. Introduction and familiarize with QtRVSim architecture simulator QtRVSim
  3. Test program for simple addition
  4. Following tasks extending the program
  5. Transformation source code in C to assembler (RISC-V instruction set).
  6. Peripheral access

What shall we do

Gain an understanding of how are the machine instructions executed/processed by a simple processor.

The simplicity, straightforwardness, and orthogonality of the instruction encoding is an essential reason why most textbooks choose as a model architecture MIPS processor. But actual license conditions are not clean and the ARM Arch64 and RISC-V architectures are more up to date. That is why the most of the leading universities switched to RISC-V architecture. Processor directly fetches and executes instructions in the binary form. PC-class personal computers do not allow us to execute RISC-V-based machine code directly on their processor (architecture X86), but there exist many simulators of this architecture. The simulator QtRVSim was developed to support the practical experience of this course.

Part 1 - Basic instruction - description and use

Detailed description:

Instruction Instruction Syntax Operation Description
add add rd, rs1, rs2 rd ← rs1 + rs2; Add: Add together values in two registers (rs1 + rs2) and stores the result in register rd.
addi addi rd, rd1, imm12 rd ← rs1 + imm12; Add immediate: Adds a value in rs1 and a signed constant (12-bit immediate value) and stores the result in rd.
sub sub rd,rs1,rs2 rd ← rs1 - rs2 Subtract: Subtracts a value in register rs2 from value of rs1 and stores result in rd.
bne bne rs1, rs2, Label if rs1 != rs2 go to PC+2*imm12; else go to PC+4 Branch on not equal: (conditional) jump if value in rs1 is not equal to a value in rs2 (imm12 = Label - PC)
beq beq rs1, rs2, Label if rs1 == rs2 go to PC+2*imm12; else go to PC+4 Branch on equal: (conditional) jump if value in rs1 is equal to a value in rs2 (imm12 = Label - PC)
slt slt rd,rs1,rs2 rd ← (rs1 < rs2) Set on less than: set register rd to one, if the condition rs1 < rs2 is true
slli slli rd,rs1,imm5 rd ← rs1 << imm5 Shift Left Logical: Shifts value in the register to the left by imm5 bits (ekvivalent to operation multiplication by constant 2imm5 )
j j Label PC ← PC + 2*imm20 Jump: unconditional jump to the Label (imm20 = Label - PC)
lw lw rd, imm12(rs1) [rd] ← Mem[[rs1] + imm12]; Load word: Loads a word from address in memory and stores it in register rd.
sw sw rs2, imm12(rs1) Mem[[rs1] + imm12] ← [rs2]; Store word: Stores a value in register rs2 to given address in memory.
lui lui rd, imm20 [rd] ← imm20<<12 Load upper immediate: Stores given immediate value (constant) imm20 to upper part of register rd. Register is 32 bits long and imm20 is 20 bits.
li li rd, Immediate lui rd, (Immediate+0x800)[31:12];
addi rd,rd, Immediate[11:0]
Load Immediate: sets register rd to 32-bitovou constant. It is pseudo-instruction - it is translated into sequence of actual instructions according to the value range.
la la rd, LabelAddr auipc rd, (LabelAddr - pc + 0x800)[31:12];
addi rd, rd, (LabelAddr - pc)[11:0];
Load Address: sets register rd 32-bit value computer relative to actual PC/instruction address. It is pseudo-instruction - it is translated into sequence of actual instructions according to the value range.

The compact decription of the RISC-V instructions riscvcard.pdf

Description of the RISC-V arcitecture at Wikipedia https://en.wikipedia.org/wiki/RISC-V.

Autoritative architecture description https://riscv.org/technical/specifications

The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA, Version 20191213

Translation of assembler source code to machine code

There is no restriction to use any plain text editor for writing assembler source code. In the laboratory, there are installed more text editors and development environments. For example vim, Emacs, etc. We suggest to select Geany for those who have no personal preference.

Open a text editor and prepare simple assembler source file named simple-lw-sw.S. The suffix capital “.S” is crucial. This suffix is assigned by standard development tools for the source files/code in assembler language and compiler decides according to the suffix how to process the file. Other recognized suffixes are .c for C language source files. .cc or .cpp for C++, .o for object files (files which content is already translated source code into target machine native instructions but there is not defined to which address will be these code fragments located). Library files suffix .a (archive) is used for functions libraries, which are included in final executable according to their requirements/use by other files. A final executable file (program) is stored without extension on Unix class systems..

.globl _start
.option norelax

.text

_start:
loop:
        // load the word from absolute address
        lw     x2, 0x400(x0)
        // store the word to absolute address
        sw     x2, 0x404(x0)
        // stop execution wait for debugger/user
        // ebreak

        // ensure that continuation does not
        // interpret random data
        beq    x0, x0, loop
        nop
        nop
        ebreak

.data
.org 0x400

src_val:
        .word  0x12345678
dst_val:
        .word  0

Assembly code source file consists of

  • instruction mnemonics which represent an abbreviation of operation (lw - load word, sw - save word, add, sub - subtract), the operands can follow after operation mnemonics. Operands are usually numbers or another specification of registers and immediate numeric constant operands. Operands are separated by comma.
  • labels (followed by a colon). It is possible to reference these labels from some instruction operands and directives to store raw words or other data into memory. The address to the memory location following the label (instruction or data) is stored in the instruction field or other memory location in the place from which is the label referenced.
  • control directives (pseudo-instructions) for assembly compiler, they usually start by a dot .
  • comments, source files designated by capital .S suffix are preprocessed before actual translation to machine code/object files. The rules for preprocessing are the same as for the C compiler, and the same format of comments can be used.

The following directives are used in the provided sample code

  • .globl - symbols identifiers following the directive are visible/accessible even outside actual compilation unit. The symbol _start or _ _start specifies program entry point by the convention.
  • .option norelax - disable link time instruction sequences optimizations (for example generate both instructions for li x2, 35 instead of single addi x2, 35)
  • .set noat - disallow use of temporary assembler register which is used for the realization of some higher level instructions translated by the assembler to the sequence of machine instructions. For example la (load address), which is macro instruction translated by assembler to the sequence of lui and ori instructions.
  • .set noreorder - forbid instructions reordering by assembler. Assembler is able to optimize ordering of some instructions which allows to fill delay-slots (these are not used by initial simulator setup for simplicity), reordering is not desirable for our code
  • .ent jméno - the mark for start of a function
  • .end jméno - the mark for end of a function
  • .text - switch to fill .text section which is used to store actual program machine instructions
  • .data - switch to fill .data sections which holds initialized data/variables/arrays
  • .word - the directive to store into actually filed section value specified after the directive

The complete manual for GNU assembler and all its directives can be found at GNU assembleru.

The compilation is performed by cross-compiler

riscv64-unknown-elf-gcc -Wl,-Ttext,0x200 -Wl,-Tdata,0x400 -march=rv32i -mabi=ilp32 -nostdlib simple-lw-sw.S -o simple-lw-sw

The invocation requires multiple parameters, because standard programs in C language require linking of initialization sequences and library functions. But our actual goal is to describe the lowest level without this automatization the first which allows understanding how it is extended and equipped by automatic initialization sequences and constructs which allows comfortable code writing of programs at a higher level and in higher level languages. Parameters -Wl, are passed to the linker program and specify to which address is located .text section with program instructions and to where starts .data section with initialized variables. Following parameters in the order of their appearance disable automatic addition of startup and initialization sequence (-nostartfiles) and disable use of standard libraries. The file name after switch -o specifies the name of the final executable file (output). File names of one or more source files follow.

The simple-lw-sw project with appropriate Makefile can be found in /opt/apo/qtrvsim/qtrvsim_template directory.

File buble_sort.S

Use buble_sort.S template for internal QtRVSim assembler compiler:

// Directives to make interresting windows visible
#pragma qtrvsim show registers
#pragma qtrvsim show memory

.globl  _start
.globl  array_size
.globl  array_start
.option norelax

.text

_start:
        la   a0, array_start
        la   a1, array_size
        lw   a1, 0(a1) // number of elements in the array

//Insert your code there

//Final infinite loop
end_loop:
        ebreak           // stop the simulator
        j end_loop
        nop

.data
// .align    2 // not supported by QtRVSim yet

array_size:
.word   15
array_start:
.word   5, 3, 4, 1, 15, 8, 9, 2, 10, 6, 11, 1, 6, 9, 12

// Specify location to show in memory window
#pragma qtrvsim focus memory array_start

For C code todo assembly language manual translation try file start.S:

int pole[10];
int main() {
  int N = 10,i;
  for(i=0; i<N; i++) {
    pole[i]=N-i;
  }
  return 0;
}
together with file start.S which executes function main from init_array.c:
.globl   _start
.text
.option norelax

_start:
     la x2, _end+0x4000
     la x3, __global_pointer$
     jal  main
     ebreak

Překlad do assembleru pro RISC-V a zobrazení přeložených instrukcí provedeme následujícími příkazy.

riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -g -c init_array.c -o init_array.o
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -g -c start.S -o start.o
riscv64-unknown-elf-gcc -Wl,-Ttext,0x200 -Wl,-Tdata,0x400 -march=rv32i -mabi=ilp32 -nostdlib  init_array.o start.o -o init_array
riscv64-unknown-elf-objdump --source init_array

On Windows, complete MSys with make utility can be installed or plain compiler riscv64-unknown-elf-gcc can be called from following batch file.

PATH=%PATH%;c:\path\to\riscv64-unknown-elf-gcc_mingw32\bin
riscv64-unknown-elf-gcc -Wl,-Ttext,0x200 -Wl,-Tdata,0x400 -march=rv32i -mabi=ilp32 -nostdlib -g init_array.c -o init_array

QtRVSim simulator is used to execute the program. Select the most simple variant of simulated processor “No pipeline no cache”. Button “Browse” is used to select executable file name (simple-lw-sw without suffix in our case) for field “Elf executable”.

The diagram with the processor is opened. Use double-click on program counter register (PC) opens program listing with actual instructions. Double click on registers blocks opens a view with the list of architectural registers. Double click on data memory shows memory content. The windows layout shown on the next picture is the appropriate and intuitive starting point

Select “Follow fetch” option in the “Program” listing window which highlights instruction/line actually fetched by the processor for execution. The start of listing in “Memory” windows should be set to the address 0x2000 on which data value 0x12345678 was placed. The program can be stepped through by “Machine” → “Step” menu entry or by the corresponding button on the toolbar. The value is loaded to the register first, and then it is stored to the following memory cell.

Change the program to process a load-store sequence in the loop. Instruction ebreak has to be removed from the loop because its purpose is to stop program execution when reached. Test that edited value at address 0x2000 is always copied to the address 0x2004. Modify the program to add two input values on 0x2000 and 0x2004 address and stored the result at address 0x2008.

Next step is to modify the program to add two vectors with a length of four words. Use assembler macroinstruction la vect_a (load address) to set registers to point to the start of the vectors.

        ...
.data

vect_a:
        .word  0x12345678
        .word  0x12345678
        .word  0x12345678
        .word  0x12345678
vect_b:
        .word  0x12345678
        ...

Continue with implementation of the program to compute an average value from eight numbers.

Automate program compilation by use of Makefile

The compiler invocation is desirable to document at least and better automate. The one way is to use a script with the sequence of commands required for compilation. Such script can be written directly for shell - command line interpreter (BASH or DASH on GNU/Linux usually). But it is not practical to translate all compilation units of a larger project when only small change modifies only one or a small subset of source files. More different systems have been developed to automate exactly these tasks. Some examples are Make, Ant, qmake, Cmake, meson, etc.

Makefile

Make is the tool which allows to automate compilation of source codes, description of the compilation process is described in Makefile.

Makefile template for the compilation of source files written in assembly language or C language for RISC-V simulator environment:

ARCH=riscv64-unknown-elf

SOURCES = change_me.S
TARGET_EXE = change_me

CC=$(ARCH)-gcc
CXX=$(ARCH)-g++
AS=$(ARCH)-as
LD=$(ARCH)-ld
OBJCOPY=$(ARCH)-objcopy

ARCHFLAGS += -mabi=ilp32
ARCHFLAGS += -march=rv32i
ARCHFLAGS += -fno-lto

CFLAGS  += -ggdb -Os -Wall
CXXFLAGS+= -ggdb -Os -Wall
AFLAGS  += -ggdb
LDFLAGS += -ggdb
LDFLAGS += -nostartfiles
LDFLAGS += -nostdlib
LDFLAGS += -static
#LDFLAGS += -specs=/opt/musl/riscv64-linux-gnu/lib/musl-gcc.specs

CFLAGS  += $(ARCHFLAGS)
CXXFLAGS+= $(ARCHFLAGS)
AFLAGS  += $(ARCHFLAGS)
LDFLAGS += $(ARCHFLAGS)

OBJECTS += $(filter %.o,$(SOURCES:%.S=%.o))
OBJECTS += $(filter %.o,$(SOURCES:%.c=%.o))
OBJECTS += $(filter %.o,$(SOURCES:%.cpp=%.o))

all : default

.PHONY : default clean dep all

%.o:%.S
        $(CC) -D__ASSEMBLY__ $(AFLAGS) -c $< -o $@

%.o:%.c
        $(CC) $(CFLAGS) $(CPPFLAGS) -c $< -o $@

%.o:%.cpp
        $(CXX) $(CXXFLAGS) $(CPPFLAGS) -c $<

%.s:%.c
        $(CC) $(CFLAGS) $(CPPFLAGS) -S $< -o $@

default : $(TARGET_EXE)

$(TARGET_EXE) : $(OBJECTS)
        $(CC) $(LDFLAGS) $^ -o $@

dep: depend

depend: $(SOURCES) $(glob *.h)
        echo '# autogenerated dependencies' > depend
ifneq ($(filter %.S,$(SOURCES)),)
        $(CC)  -D__ASSEMBLY__ $(AFLAGS) -w -E -M $(filter %.S,$(SOURCES)) \
          >> depend
endif
ifneq ($(filter %.c,$(SOURCES)),)
        $(CC) $(CFLAGS) $(CPPFLAGS) -w -E -M $(filter %.c,$(SOURCES)) \
          >> depend
endif
ifneq ($(filter %.cpp,$(SOURCES)),)
        $(CXX) $(CXXFLAGS) $(CPPFLAGS) -w -E -M $(filter %.cpp,$(SOURCES)) \
          >> depend
endif

clean:
        rm -f *.o *.a $(OBJECTS) $(TARGET_EXE) depend

#riscv64-unknown-elf --source qtrvsim_binrep

-include depend

tab character has to be used to indent of commands in the Makefile. Makefile does not recognize indent by spaces.

Makefile consists from definitions (assignment of values to variables) and rules. The rules start by a line which defines dependency of rule target(s) on the dependencies listed after the colon. The dependencies are names of files or abstracts commands which as to be (make) available before the commands following the first rule line can create required results. File names can be complete names or their base part can be substituted by character “%” which allows specifying rule for a whole class of transformations from one compilation stage to another. Even more complete template for the compilation of assembler and C source files to RISC-V target platform with an automatic building of dependencies on header files can be found in directory /opt/apo/qtrvsim_template on the computers in the laboratory.

Compilation

A compilation is invoked by a command make (the make has to be invoked in the directory where Makefile and program source code are located). Make generates multiple output files. The file without an extension is used for execution in QtRVSIm environment. The process of compilation translated the compilation unit (one source wile together with includes header files for a simple case) into object files (.o) in relocatable form. Object files are then collected by the linker which resolves address references between compilation units and locates code to the final addresses. It is necessary to equip the actual sequences of machine instructions by the envelope which specifies where to fill references during .o files linking during the final placement on the specified addresses. Even instruction and data in the final executable form usually require some information for operating systems where they should be loaded/mapped in the memory or process address-space. The ELF (Executable and Linkable Format) is used to store these metadata in our case and generally on most of the modern systems.

Determine variable addresses and correspondence of source code and final ELF executable

Next command can be used to find addresses of the final location of variables and data entries after linking

riscv64-unknown-elf-nm program

The list of sections and their locations can be listed by

riscv64-unknown-elf-objdump --headers program

List final machine code after translation with corresponding source lines

riscv64-unknown-elf-objdump --source program

Tasks

  1. learn how to use compiler and simulator
  2. change program to run memory read and write in a loop (try to modify program directly in the simulator, change break to nop)
  3. computation with vector - return to the source and define vectors vec_a, vec_b, vec_c - four word elements each and use lw, sw and add instructions to compute vec_c[0] = vec_a[0] + vec_b[0]
  4. extend program to compute sum for each vec_c element in introduced the inner loop
  5. implementation of program to compute n-th member of Fibonacci series
  6. start of the work on bubble-sort algorithm

gcc -E assembler.S -o preprocessed-pro-mips.s can be used for preprocessing.

Part 2 - Transcribe a program from C to Assembler

In many practical applications we have to use median filter. This median filter removes noise (obvious outliers/dead pixels) from a signal or an image. The median filter takes a neighborhood of a sample (10 samples before and 10 after), finds median value and replaces the sample value with this median. Very similar to this filter is mean filter that replaces the sample value with average value of the nearby samples. The median value is usually calculated by sorting the samples by value and picking the sample in the middle. The sorting algorithm is a cornerstone to median filter implementation. Lets assume we have 21 integers stored in array in memory. The array begins in some given address (e.g. 0x00). On integer occupies one word in the memory. The task is to sort the integers in ascending order. To do this we will implement the bubble sort algorithm. In this algorithm two adjacent values are compared and if they are in wrong order, they are swapped. And this comparisons goes repetitively through array until no swaps are done.

The code for bubble sort is bellow:

int pole[5]={5,3,4,1,2};
int main()
{
        int N = 5,i,j,tmp;
        for(i=0; i<N; i++)
                for(j=0; j<N-1-i; j++)
                        if(pole[j+1]<pole[j])
                        {
                                tmp = pole[j+1];
                                pole[j+1] = pole[j];
                                pole[j] = tmp;
                        }
        return 0;
}

The example of sorting 5 numbers is bellow:
5, 3, 4, 1, 2 –> initial state
3, 4, 1, 2, 5 –> after the first outer cycle finished
3, 1, 2, 4, 5 –> after the second outer cycle finished
1, 2, 3, 4, 5
1, 2, 3, 4, 5
1, 2, 3, 4, 5 –> after the last outer cycle finished - sorted

Transcribe C code above to RISC-V assembler. Verify correctness of your implementation in RISC-V simulator. We will be using this program in the next class. So finish the program at home, if you have not finished it during the class.


Here is a template, you can use:

.globl    array
.data
.align    2

array:
.word    5 3 4 1 2

.text
.globl start
.ent start

start:
// TODO: Write your code here
nop
.end start

How to transcribe short fragments of C code into assembler

if Command
if (i ==j)
  f = g + h;
 
f = f – i;
//   s0=f, s1=g, s2=h, s3=i, s4=j

  bne s3, s4, L1   // If i!=j, go to label L1 
  add s0, s1, s2   // if block: f=g+h
L1:
  sub s0, s0, s3   // f = f-i
if-else Command
if (i ==j)
  f = g + h;
else
  f = f – i;
//   s0=f, s1=g, s2=h, s3=i, s4=j

  bne s3, s4, else   // If i!=j, go to **else** label
  add s0, s1, s2     // if block: f=g+h
  j L2               // jump behind the **else** block
else:
  sub s0, s0, s3     // else block: f = f-i
L2:
while Cycle
int pow = 1;
int x = 0;
 
while(pow != 128)
{
  pow = pow*2;
  x = x + 1;
}
// s0=pow, s1=x

  addi s0, 0, 1     // pow = 1
  addi s1, 0, 0     // x = 0
  addi t0, 0, 128   // t0 = 128 to compare (always have to compare two registers)

while:
  beq  s0, t0, done  // If pow==128, end the cycle. Go to done label.
  slli s0, s0, 1     // pow = pow*2
  addi s1, s1, 1     // x = x+1
  j    while
done:
for Cycle
int sum = 0;
 
for(int i=0; i!=10; i++)
{
  sum = sum + i;
}
//Is equivalent to following while cycle:
int sum = 0;
 
int i = 0;
while(i!=10){
  sum = sum + i;
  i++;
}
// nut even there is do-while cycle faster
// because inial sum and i are given by constants.
// and execution of the fisrt cycle is guaranteed.
// If they are not fixed for the first iteration
// then aditional if can be used
int sum = 0;
int i = 0;
do
{ sum = sum + i;
  i++;
} while(i!=10)
Read values from the data memory.
// Just as an example...
int a, *pa=0x80020040;
int b, *pb=0x80020044;
int c, *pc=0x00000124;
 
a = *pa;
b = *pb;
c = *pc;
// s0=pa (Base address), s1=a, s2=b, s3=c

lui  s0, 0x80020      // pa = 0x80020000;
lw   s1, 0x40(s0)     // a = *pa;
lw   s2, 0x44(s0)     // b = *pb;

addi s0, 0, 0x124     // pc = 0x00000124;
lw   s3, 0x0(s0)      // c = *pc;
Increment values in an array
int array[4] = { 7, 2, 3, 5 };
 
int main()
{
   int i,tmp;
   for(i=0; i<4; i++)
   {
      tmp = array[i];
      tmp += 1;
      pole[i] = tmp;
   }
   return 0;
}
Complete code for QtRVSim simulator:
.globl    array        // label "array" is declared as global. It is visible from all files in the project.
.data                  // directive indicating start of the data segment
.align    2            // set data alignment to 4 bytes

array:                 // label - name of the memory block
.word    7, 2, 3, 5    // values in the array to increment...

.text                  // beginning of the text segment (or code segment)
.globl start

start:
la   s0, array        // store address of the "array" to the register s0
addi s1, zero, 0        // initialization instruction of for cycle: i=0, kde i=s1
addi s2, zero, 4        // set the upper bound for cycle

for:
  beq  s1, s2, done   // if s1 == s2, go to label done and break the cycle
  lw   s3, 0x0(s0)    // load value from the array to s3
  add  s3, s3, 0x1    // increment the s3 register
  sw   s3, 0x0(s0)    // replace (store) value from s3 register
  addi s0, s0, 0x4    // increment offset and move to the other value in the array
  addi s1, s1, 0x1    // increment number of passes through the cycle (i++).
  j    for              // jump to  **for** label
done:
nop
.end start


Peripherals mapped into memory address space

QtRVSim simulator includes a few simple peripherals which are mapped into memory address space.

The first is simple serial port (UART) connected to terminal window. The registers locations and bit fields is the same as for simulators SPIM and MARS. These maps serial port from address 0xffff0000. QtRVSim maps the UART peripheral to this address as well but offers alternative mapping to address 0xffffc000 which can be encoded as absolute address into LW and SW instructions with zero base register.

Address Register name Bit Description
0xffffc000 SERP_RX_ST_REG Serial port receiver status register
0 Flag set to one when there is new received character in SERP_RX_DATA_REG register
1 When set to one enables interrupt from reception detailed
0xffffc004 SERP_RX_DATA_REG 7 .. 0 ASCII code of received character
0xffffc000 SERP_TX_ST_REG Status register of transmitter writing to terminal
0 When one is read, transmitter is ready to accept character
1 When set to one enables transmitter interrupt detailed
0xffffc004 SERP_TX_DATA_REG 7 .. 0 ASCII code of character to transmit

The next peripherals emulates interaction with simple control elements of a real device. The registers map matches to the subset of registers of dial knobs and LËD indicators peripheral which is available for input and output on a development kits MicroZed APO which are used for your semester work.

Address register name Bit Description
0xffffc104 SPILED_REG_LED_LINE 31 .. 0 The word shown in binary, decimal and hexadecimal
0xffffc110 SPILED_REG_LED_RGB1 23 .. 0 PWM duty cycle specification for RGB LED 1 components
23 .. 16 Red component R
15 .. 8 Green component G
7 .. 0 Blue component B
0xffffc114 SPILED_REG_LED_RGB2 23 .. 0 PWM duty cycle specification for RGB LED 2 components
23 .. 16 Red component R
15 .. 8 Green component G
7 .. 0 Blue component B
0xffffc124 SPILED_REG_KNOBS_8BIT 31 .. 0 Filtered values of dial knobs as 8 numbers
7 .. 0 Blue dial value B
15 .. 8 Green dial value G
23 .. 16 Red dial value R

Analysis of Compiled Code

A simple program reads position of the simulator knobs dials and converts the read values to the RGB led color and text/terminal output. Program is available from directory /opt/apo/qtrvsim_binrep on the laboratory computers. There is available archive to download as well qtrvsim_binrep.tar.gz.

The C source code has been compiled by the following commands sequence

riscv64-unknown-elf-gcc -D__ASSEMBLY__ -ggdb -mabi=ilp32 -march=rv32i -fno-lto -c crt0local.S -o crt0local.o
riscv64-unknown-elf-gcc -ggdb -Os -Wall -mabi=ilp32 -march=rv32i -fno-lto  -c qtrvsim_binrep.c -o qtrvsim_binrep.o
riscv64-unknown-elf-gcc -ggdb -nostartfiles -nostdlib -static -mabi=ilp32 -march=rv32i -fno-lto crt0local.o qtrvsim_binrep.o -lgcc -o qtrvsim_binrep

Alternative compilation for RISC-V when picolibc library is used.

riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 --specs=/opt/picolibc/lib/riscv64-unknown-elf/specs/picolibc.specs /opt/apo/binrep/qtrvsim_binrep/qtrvsim_binrep.c -o qtrvsim_binrep

The content of the program compiled into ELF executable format is examined by objdump command

riscv64-unknown-elf --source qtrvsim_binrep

The same task for the MIPS architecture.

mips-elf-gcc -D__ASSEMBLY__ -ggdb -fno-lto -c crt0local.S -o crt0local.o
mips-elf-gcc -ggdb -Os -Wall -fno-lto  -c qtmips_binrep.c -o qtmips_binrep.o
mips-elf-gcc -ggdb -nostartfiles -static -fno-lto crt0local.o qtmips_binrep.o -o qtmips_binrep

There is output with detailed commentaries included.

qtmips_binrep:     file format elf32-bigmips


Disassembly of section .text:

00400018 <main>:

/*
 * The main entry into example program
 */
int main(int argc, char *argv[])
{
  400018:       27bdffe8        addiu   $29,$29,-24
                           allocate space on the stack for main() function
                           stack frame
  
  40001c:       afbf0014        sw      $31,20($29)
                           save previous value of the return address register
                           to the stack.

 while (1) {
     uint32_t rgb_knobs_value;
     unsigned int uint_val;

      rgb_knobs_value = *(volatile uint32_t*)(mem_base + SPILED_REG_KNOBS_8BIT_o);
  400020:       8c04c124        lw      $4,-16092($0)
                           Read value from the address corresponding to the
                           sum of "SPILED_REG_BASE" and "SPILED_REG_KNOBS_8BIT_o"
                           peripheral register offset
                           LW is instruction to load the word. Address is formed
                           from the sum of register $0 (fixed zero) and -16092,
                           which is represented in hexadecimal as 0xffffc124
                           i.e., sum of 0xffffc100 and 0x24. The read value is
                           stored in register $4.

  400024:       00000000        sll     $0,$0,0x0
                           one NOP instruction to ensure that load finishes before
                           the further value use.

  400028:       00041027        nor     $2,$0,$4
                           Compute bit complement "~" of the value in the register
                           $4 and store it into register $2

     *(volatile uint32_t*)(mem_base + SPILED_REG_LED_LINE_o) = rgb_knobs_value;
  40002c:       ac04c104        sw      $4,-16124($0)
                           Store RGB knobs values from register $4to the "LED"
                           line register which is shown in binary decimal
                           and hexadecimal on the QtMips target.
                           Address 0xffffc104

     *(volatile uint32_t*)(mem_base + SPILED_REG_LED_RGB1_o) = rgb_knobs_value;
  400030:       ac04c110        sw      $4,-16112($0)
                           Store RGB knobs values to the corresponding components
                           controlling a color/brightness of the RGB LED 1
                           Address 0xffffc110


     *(volatile uint32_t*)(mem_base + SPILED_REG_LED_RGB2_o) = ~rgb_knobs_value;
  400034:       ac02c114        sw      $2,-16108($0)
                           Store complement of RGB knobs values to the corresponding
                           components controlling a color/brightness of the RGB LED 2
                           Address 0xffffc114

     /* Assign value read from knobs to the basic signed and unsigned types */
     uint_val = rgb_knobs_value;
                           the read value resides in the register 4, which
                           correspond to the first argument register a0

     /* Print values */
     serp_send_hex(uint_val);
  400038:       0c100028        jal     4000a0 <serp_send_hex>
  40003c:       00000000        sll     $0,$0,0x0
                           call the function to send hexadecimal value to
                           the serial port, one instruction after JAL
                           is executed in its delay-slot, PC pointing
                           after this instruction (0x400040) is stored
                           to the register 31, return address register

     serp_tx_byte('\n');
  400040:       0c100020        jal     400080 <serp_tx_byte>
  400044:       2404000a        addiu   $4,$0,10
                           call routine to send new line character to the
                           serial port. The ASCII value corresponding to
                           '\n' is set to argument a0 register in delay slot
                           of JAL. JAL is decoded and in parallel instruction
                           addiu $4,$0,10 is executed then PC pointing to the address
                           0x400048 after delay slot is stored to return address
                           register and next instruction is fetch from the JAL
                           instruction target address, start of the function
                           serp_tx_byte

  400048:       1000fff5        beqz    $0,400020 <main+0x8>
  40004c:       00000000        sll     $0,$0,0x0
                           branch back to the start of the loop reading value from
                           the knobs


00400050 <_start>:
        la      $gp, _gp
  400050:       3c1c0041        lui     $28,0x41
  400054:       279c90e0        addiu   $28,$28,-28448
                           Load global data base pointer to the global data
                           base register 28 - gp.
                           Symbol _gp is provided by linker.

        addi    $a0, $zero, 0
  400058:       20040000        addi    $4,$0,0
                           Set regist a0 (the first main function argument)
                           to zero, argc is equal to zero.

        addi    $a1, $zero, 0
  40005c:       20050000        addi    $5,$0,0
                           Set regist a1 (the second main function argument)
                           to zero, argv is equal to NULL.

        jal     main
  400060:       0c100006        jal     400018 <main>
        nop
  400064:       00000000        sll     $0,$0,0x0
                           Call the main function. Return address is stored
                           in the ra ($31) register.


00400068 <quit>:
quit:
        addi    $a0, $zero, 0
  400068:       20040000        addi    $4,$0,0
                           If the main functio returns, set exit value to 0

        addi    $v0, $zero, 4001  /* SYS_exit */
  40006c:       20020fa1        addi    $2,$0,4001
                           Set system call number to code representing exit()

        syscall
  400070:       0000000c        syscall
                           Call the system.

00400074 <loop>:

loop:   break
  400074:       0000000d        break
                           If there is not a system try to stop the execution
                           by invoking debugging exception

        beq     $zero, $zero, loop
  400078:       1000fffe        beqz    $0,400074 <loop>
        nop
  40007c:       00000000        sll     $0,$0,0x0
                           If even this does not stop execution, command CPU
                           to spin in busy loop.

void serp_tx_byte(int data)
{
00400080 <serp_tx_byte>:
  while (!(serp_read_reg(SERIAL_PORT_BASE, SERP_TX_ST_REG_o) &
                SERP_TX_ST_REG_READY_m));
  400080:       8c02c008        lw      $2,-16376($0)
  400084:       00000000        sll     $0,$0,0x0
                           Read serial port transmit status register,
                           address 0xffffc008

  while (!(serp_read_reg(SERIAL_PORT_BASE, SERP_TX_ST_REG_o) &
  400088:       30420001        andi    $2,$2,0x1
  40008c:       1040fffc        beqz    $2,400080 <serp_tx_byte>
  400090:       00000000        sll     $0,$0,0x0
                           Wait again till UART is ready to accept
                           character - bit 0 is not zero.
                           NOP in the delayslot.

  *(volatile uint32_t *)(base + reg) = val;
  400094:       ac04c00c        sw      $4,-16372($0)
                           write value from register 4 (the first argument a0)
                           to the address 0xffffc00c (SERP_TX_DATA_REG_o)
                           serial port tx data register.
}
  400098:       03e00008        jr      $31
  40009c:       00000000        sll     $0,$0,0x0
                           jump/return back to continue in callee program
                           address of the next fetch instruction is read
                           from the return address register 32 ra
                           
void serp_send_hex(unsigned int val)
{
004000a0 <serp_send_hex>:
  4000a0:       27bdffe8        addiu   $29,$29,-24
                           allocate space on the stack for the routine stack frame
                                                      
  4000a4:       00802825        or      $5,$4,$0
                           copy value of the fisrt argument regsiter 4 (a0)
                           to the register 5

  for (i = 8; i > 0; i--) {
  4000a8:       24030008        addiu   $3,$0,8
                           set the value of the register 3 to the 8

  4000ac:       afbf0014        sw      $31,20($29)
                           save previous value of the return address register
                           to the stack.

    char c = (val >> 28) & 0xf;
  4000b0:       00051702        srl     $2,$5,0x1c
                           shift value in register 5 right by 28 bits and store
                           result in the register 2

  4000b4:       304600ff        andi    $6,$2,0xff
                           abundant operation to limit value range to the character
                           type variable and store result in the register 6
    if (c < 10 )
  4000b8:       2c42000a        sltiu   $2,$2,10
                           set register 2 to one if the value is smaller than 10

      c += 'A' - 10;
  4000bc:       10400002        beqz    $2,4000c8 <serp_send_hex+0x28>
  4000c0:       24c40037        addiu   $4,$6,55
                           if value is larger or equal (register 2 is 0/false) then add
                           value 55 ('A' - 10)..(0x41 - 0xa) = 0x37 = 55 to the register
                           6 and store result in the register 4. This operation is
                           executed even when the branch arm before else is executed,
                           but result is immediately overwritten by next instruction
      c += '0';
  4000c4:       24c40030        addiu   $4,$6,48
                           add value 0x30 = 48 = '0' to the value in the register 6
                           and store result in the register 4 - the fisrt argument a0
  
    serp_tx_byte(c);
  4000c8:       0c100020        jal     400080 <serp_tx_byte>
  4000cc:       2463ffff        addiu   $3,$3,-1
                           call subroutine to send byte to the serial port
                           decrement loop control variable (i) in delay-slot

  for (i = 8; i > 0; i--) {
  4000d0:       1460fff7        bnez    $3,4000b0 <serp_send_hex+0x10>
  4000d4:       00052900        sll     $5,$5,0x4
                           the final condition of for loop converted to do {} while()
                           loop. If not all 8 character send loop again.
                           Shift left value in the register 5 by 4 bit positions.
                           The compiler does not store values of local variables to
                           the stack even does not store values in caller save registers
                           (which requires to save previous values to the function stack frame).
                           Compiler can use this optimization because it knows registers usage
                           of called function serp_tx_byte().
  }
  4000d8:       8fbf0014        lw      $31,20($29)
  4000dc:       00000000        sll     $0,$0,0x0
                           restore return address register value to that found at function
                           start
                           
  4000e0:       03e00008        jr      $31
  4000e4:       27bd0018        addiu   $29,$29,24
                           return to the caller function. Instruction in jump register
                           delay-slot is used to restore stack pointer/free function frame.

  • GNU Cross Compiler for MIPS-ELF architecture - GNU Compatible Compiler for MIPS architecture for Debian Linux OS (x86_64/i586).
  • gcc-binutils-newlib-mips-elf_4.4.4-1_mingw32.zip - GNU Compatible compiler for MIPS architecture for MS Windows with MinGW32. To compile programs for MipsIT simulator you will need to specify following parameters: -nostdlib -nodefaultlibs -nostartfiles -Wl,-Ttext,0x80020000. For more complex programs you will probably have to specify -lm -lgcc -lc parameters.
  • MIPS Architecture at Wikipedia - description of MIPS processor and complete instruction set.
    • Missouri State University - Alternative MIPS simulator in JAVA
    • The source code in this simulator has to be without macro definitions. If you have source code from MipsIT simulator, you have to preprocess it. To do this, you can use e.g. GCC compiler in following way:

gcc -E assembler.S -o preprocessed_assembler.s

courses/b35apo/en/tutorials/03/start.txt · Last modified: 2024/02/02 18:41 (external edit)