Warning

# 3. Processor organization, instruction set

## Exercise outline

1. Basic instructions and their description
2. Transformation source code in C to assembler (MIPS instruction set).
3. Peripheral access

## What shall we do

### Part 1 - Basic instruction - description and use

Detailed description:

Instruction Instruction Syntax Operation Description
add add \$d, \$s, \$t \$d = \$s + \$t; Add: Add together values in two registers (\$s + \$t) and stores the result in register \$d. addi addi \$t, \$s, C \$t = \$s + C; Add immediate: Adds a value in \$s and a signed constant (immediate value) and stores the result in \$t. sub sub \$d,\$s,\$t \$d = \$s - \$t Subtract: Subtracts a value in register \$t from value of \$s and stores result in \$d.
bne bne \$s, \$t, offset if \$s != \$t go to PC+4+4*offset; else go to PC+4 Branch on not equal: (conditional) jump if value in \$s is not equal to a value in \$t.
beq beq \$s, \$t, offset if \$s == \$t go to PC+4+4*offset; else go to PC+4 Branch on equal: (conditional) jump if value in \$s is equal to a value in \$t.
jump j C PC = (PC ∧ 0xf0000000) ∨ 4*C Jump: Unconditional jump to label C.
lw lw \$t,C(\$s) \$t = Memory[\$s + C] Load word: Loads a word from address in memory and stores it in register \$t. sw sw \$t,C(\$s) Memory[\$s + C] = \$t Store word: Stores a value in register \$t to given address in memory.
lui lui \$t,C \$t = C << 16 Load upper immediate: Stores given immediate value (constant) C to upper part of register \$t. Register is 32 bits long and C is 16 bits. la la \$at, LabelAddr lui \$at, LabelAddr[31:16]; ori \$at,\$at, LabelAddr[15:0] Load Address: stores a 32 bit (address of) label and stores it to register \$at. This is a pseudo-instruction - it is translated into sequence of actual instructions.

### Part 2 - Transcribe a program from C to Assembler

In many practical applications we have to use median filter. This median filter removes noise (obvious outliers/dead pixels) from a signal or an image. The median filter takes a neighborhood of a sample (10 samples before and 10 after), finds median value and replaces the sample value with this median. Very similar to this filter is mean filter that replaces the sample value with average value of the nearby samples. The median value is usually calculated by sorting the samples by value and picking the sample in the middle. The sorting algorithm is a cornerstone to median filter implementation. Lets assume we have 21 integers stored in array in memory. The array begins in some given address (e.g. 0x00). On integer occupies one word in the memory. The task is to sort the integers in ascending order. To do this we will implement the bubble sort algorithm. In this algorithm two adjacent values are compared and if they are in wrong order, they are swapped. And this comparisons goes repetitively through array until no swaps are done.

The code for bubble sort is bellow:

int pole[5]={5,3,4,1,2};
int main()
{
int N = 5,i,j,tmp;
for(i=0; i<N; i++)
for(j=0; j<N-1-i; j++)
if(pole[j+1]<pole[j])
{
tmp = pole[j+1];
pole[j+1] = pole[j];
pole[j] = tmp;
}
return 0;
}

The example of sorting 5 numbers is bellow:
5, 3, 4, 1, 2 –> initial state
3, 4, 1, 2, 5 –> after the first outer cycle finished
3, 1, 2, 4, 5 –> after the second outer cycle finished
1, 2, 3, 4, 5
1, 2, 3, 4, 5
1, 2, 3, 4, 5 –> after the last outer cycle finished - sorted

Transcribe C code above to MIPS assembler. Verify correctness of your implementation in Mips simulator. We will be using this program in the next class. So finish the program at home, if you have not finished it during the class.

Here is a template, you can use:

#define t0 $8 #define t1$9
#define t2 $10 #define t3$11
#define t4 $12 #define s0$16
#define s1 $17 #define s2$18
#define s3 19 .globl array .data .align 2 array: .word 5 3 4 1 2 .text .globl start .ent start start: // TODO: Write your code here nop .end start ### How to transcribe short fragments of C code into assembler if Command if (i ==j) f = g + h; f = f – i; // s0=f, s1=g, s2=h, s3=i, s4=j bne s3, s4, L1 // If i!=j, go to label L1 add s0, s1, s2 // if block: f=g+h L1: sub s0, s0, s3 // f = f-i if-else Command if (i ==j) f = g + h; else f = f – i; // s0=f, s1=g, s2=h, s3=i, s4=j bne s3, s4, else // If i!=j, go to **else** label add s0, s1, s2 // if block: f=g+h j L2 // jump behind the **else** block else: sub s0, s0, s3 // else block: f = f-i L2: while Cycle int pow = 1; int x = 0; while(pow != 128) { pow = pow*2; x = x + 1; } // s0=pow, s1=x addi s0,0, 1     // pow = 1
addi s1, $0, 0 // x = 0 addi t0,$0, 128   // t0 = 128 to compare (always have to compare two registers)

while:
beq  s0, t0, done  // If pow==128, end the cycle. Go to done label.
sll  s0, s0, 1     // pow = pow*2
addi s1, s1, 1     // x = x+1
j    while
done:
for Cycle
int sum = 0;

for(int i=0; i!=10; i++)
{
sum = sum + i;
}
//Is equivalent to following while cycle:
int sum = 0;

int i = 0;
while(i!=10){
sum = sum + i;
i++;
}
Read values from the data memory.
// Just as an example...
int a, *pa=0x80020040;
int b, *pb=0x80020044;
int c, *pc=0x00001234;

a = *pa;
b = *pb;
c = *pc;
// s0=pa (Base address), s1=a, s2=b, s3=c

lui  s0, 0x8002   // pa = 0x80020000;
lw   s1, 0x40(s0)     // a = *pa;
lw   s2, 0x44(s0)     // b = *pb;

addi s0, $0, 0x1234 // pc = 0x00001234; lw s3, 0x0(s0) // c = *pc; Increment values in an array int array[4] = { 7, 2, 3, 5 }; int main() { int i,tmp; for(i=0; i<4; i++) { tmp = array[i]; tmp += 1; pole[i] = tmp; } return 0; } Complete code for MipsIt simulator: #define s0$16
#define s1 $17 #define s2$18
#define s3 19 .globl array // label "array" is declared as global. It is visible from all files in the project. .data // directive indicating start of the data segment .align 2 // set data alignment to 4 bytes array: // label - name of the memory block .word 7, 2, 3, 5 // values in the array to increment... .text // beginning of the text segment (or code segment) .globl start .ent start start: la s0, array // store address of the "array" to the register s0 addi s1,0, 0        // initialization instruction of for cycle: i=0, kde i=s1
addi s2, $0, 4 // set the upper bound for cycle for: beq s1, s2, done // if s1 == s2, go to label done and break the cycle lw s3, 0x0(s0) // load value from the array to s3 add s3, s3, 0x1 // increment the s3 register sw s3, 0x0(s0) // replace (store) value from s3 register addi s0, s0, 0x4 // increment offset and move to the other value in the array addi s1, s1, 0x1 // increment number of passes through the cycle (i++). j for // jump to **for** label done: nop .end start ### Peripherals mapped into memory address space QtMips simulator includes a few simple peripherals which are mapped into memory address space. The first is simple serial port (UART) connected to terminal window. The registers locations and bit fields is the same as for simulators SPIM and MARS. These maps serial port from address 0xffff0000. QtMips maps the UART peripheral to this address as well but offers alternative mapping to address 0xffffc000 which can be encoded as absolute address into LW and SW instructions with zero base register. Address Register name Bit Description 0xffffc000 SERP_RX_ST_REG Serial port receiver status register 0 Flag set to one when there is new received character in SERP_RX_DATA_REG register 1 When set to one enables interrupt from reception detailed 0xffffc004 SERP_RX_DATA_REG 7 .. 0 ASCII code of received character 0xffffc000 SERP_TX_ST_REG Status register of transmitter writing to terminal 0 When one is read, transmitter is ready to accept character 1 When set to one enables transmitter interrupt detailed 0xffffc004 SERP_TX_DATA_REG 7 .. 0 ASCII code of character to transmit The next peripherals emulates interaction with simple control elements of a real device. The registers map matches to the subset of registers of dial knobs and LËD indicators peripheral which is available for input and output on a development kits MicroZed APO which are used for your semester work. Address register name Bit Description 0xffffc104 SPILED_REG_LED_LINE 31 .. 0 The word shown in binary, decimal and hexadecimal 0xffffc110 SPILED_REG_LED_RGB1 23 .. 0 PWM duty cycle specification for RGB LED 1 components 23 .. 16 Red component R 15 .. 8 Green component G 7 .. 0 Blue component B 0xffffc114 SPILED_REG_LED_RGB2 23 .. 0 PWM duty cycle specification for RGB LED 2 components 23 .. 16 Red component R 15 .. 8 Green component G 7 .. 0 Blue component B 0xffffc124 SPILED_REG_KNOBS_8BIT 31 .. 0 Filtered values of dial knobs as 8 numbers 7 .. 0 Blue dial value B 15 .. 8 Green dial value G 23 .. 16 Red dial value R ### Analysis of Compiled Code A simple program reads position of the simulator knobs dials and converts the read values to the RGB led color and text/terminal output. Program is available from directory /opt/apo/qtmips_binrep on the laboratory computers. There is available archive to download as well qtmips_binrep.tar.gz. The C source code has been compiled by the following commands sequence mips-elf-gcc -D__ASSEMBLY__ -ggdb -fno-lto -c crt0local.S -o crt0local.o mips-elf-gcc -ggdb -Os -Wall -fno-lto -c qtmips_binrep.c -o qtmips_binrep.o mips-elf-gcc -ggdb -nostartfiles -static -fno-lto crt0local.o qtmips_binrep.o -o qtmips_binrep The content of the program compiled into ELF executable format is examined by objdump command mips-elf-objdump --source -M no-aliases,reg-names=numeric qtmips_binrep There is output with detailed commentaries included. qtmips_binrep: file format elf32-bigmips Disassembly of section .text: 00400018 <main>: /* * The main entry into example program */ int main(int argc, char *argv[]) { 400018: 27bdffe8 addiu$29,$29,-24 allocate space on the stack for main() function stack frame 40001c: afbf0014 sw$31,20($29) save previous value of the return address register to the stack. while (1) { uint32_t rgb_knobs_value; unsigned int uint_val; rgb_knobs_value = *(volatile uint32_t*)(mem_base + SPILED_REG_KNOBS_8BIT_o); 400020: 8c04c124 lw$4,-16092($0) Read value from the address corresponding to the sum of "SPILED_REG_BASE" and "SPILED_REG_KNOBS_8BIT_o" peripheral register offset LW is instruction to load the word. Address is formed from the sum of register$0 (fixed zero) and -16092,
which is represented in hexadecimal as 0xffffc124
i.e., sum of 0xffffc100 and 0x24. The read value is
stored in register $4. 400024: 00000000 sll$0,$0,0x0 one NOP instruction to ensure that load finishes before the further value use. 400028: 00041027 nor$2,$0,$4
Compute bit complement "~" of the value in the register
$4 and store it into register$2

*(volatile uint32_t*)(mem_base + SPILED_REG_LED_LINE_o) = rgb_knobs_value;
40002c:	ac04c104 	sw	$4,-16124($0)
Store RGB knobs values from register $4to the "LED" line register which is shown in binary decimal and hexadecimal on the QtMips target. Address 0xffffc104 *(volatile uint32_t*)(mem_base + SPILED_REG_LED_RGB1_o) = rgb_knobs_value; 400030: ac04c110 sw$4,-16112($0) Store RGB knobs values to the corresponding components controlling a color/brightness of the RGB LED 1 Address 0xffffc110 *(volatile uint32_t*)(mem_base + SPILED_REG_LED_RGB2_o) = ~rgb_knobs_value; 400034: ac02c114 sw$2,-16108($0) Store complement of RGB knobs values to the corresponding components controlling a color/brightness of the RGB LED 2 Address 0xffffc114 /* Assign value read from knobs to the basic signed and unsigned types */ uint_val = rgb_knobs_value; the read value resides in the register 4, which correspond to the first argument register a0 /* Print values */ serp_send_hex(uint_val); 400038: 0c100028 jal 4000a0 <serp_send_hex> 40003c: 00000000 sll$0,$0,0x0 call the function to send hexadecimal value to the serial port, one instruction after JAL is executed in its delay-slot, PC pointing after this instruction (0x400040) is stored to the register 31, return address register serp_tx_byte('\n'); 400040: 0c100020 jal 400080 <serp_tx_byte> 400044: 2404000a addiu$4,$0,10 call routine to send new line character to the serial port. The ASCII value corresponding to '\n' is set to argument a0 register in delay slot of JAL. JAL is decoded and in parallel instruction addiu$4,$0,10 is executed then PC pointing to the address 0x400048 after delay slot is stored to return address register and next instruction is fetch from the JAL instruction target address, start of the function serp_tx_byte 400048: 1000fff5 beqz$0,400020 <main+0x8>
40004c:	00000000 	sll	$0,$0,0x0
branch back to the start of the loop reading value from
the knobs

00400050 <_start>:
la      $gp, _gp 400050: 3c1c0041 lui$28,0x41
400054:	279c90e0 	addiu	$28,$28,-28448
Load global data base pointer to the global data
base register 28 - gp.
Symbol _gp is provided by linker.

addi    $a0,$zero, 0
400058:	20040000 	addi	$4,$0,0
Set regist a0 (the first main function argument)
to zero, argc is equal to zero.

addi    $a1,$zero, 0
40005c:	20050000 	addi	$5,$0,0
Set regist a1 (the second main function argument)
to zero, argv is equal to NULL.

jal     main
400060:	0c100006 	jal	400018 <main>
nop
400064:	00000000 	sll	$0,$0,0x0
Call the main function. Return address is stored
in the ra ($31) register. 00400068 <quit>: quit: addi$a0, $zero, 0 400068: 20040000 addi$4,$0,0 If the main functio returns, set exit value to 0 addi$v0, $zero, 4001 /* SYS_exit */ 40006c: 20020fa1 addi$2,$0,4001 Set system call number to code representing exit() syscall 400070: 0000000c syscall Call the system. 00400074 <loop>: loop: break 400074: 0000000d break If there is not a system try to stop the execution by invoking debugging exception beq$zero, $zero, loop 400078: 1000fffe beqz$0,400074 <loop>
nop
40007c:	00000000 	sll	$0,$0,0x0
If even this does not stop execution, command CPU
to spin in busy loop.

void serp_tx_byte(int data)
{
00400080 <serp_tx_byte>:
400080:	8c02c008 	lw	$2,-16376($0)
400084:	00000000 	sll	$0,$0,0x0
Read serial port transmit status register,

400088:	30420001 	andi	$2,$2,0x1
40008c:	1040fffc 	beqz	$2,400080 <serp_tx_byte> 400090: 00000000 sll$0,$0,0x0 Wait again till UART is ready to accept character - bit 0 is not zero. NOP in the delayslot. *(volatile uint32_t *)(base + reg) = val; 400094: ac04c00c sw$4,-16372($0) write value from register 4 (the first argument a0) to the address 0xffffc00c (SERP_TX_DATA_REG_o) serial port tx data register. } 400098: 03e00008 jr$31
40009c:	00000000 	sll	$0,$0,0x0
jump/return back to continue in callee program
from the return address register 32 ra

void serp_send_hex(unsigned int val)
{
004000a0 <serp_send_hex>:
4000a0:	27bdffe8 	addiu	$29,$29,-24
allocate space on the stack for the routine stack frame

4000a4:	00802825 	or	$5,$4,$0 copy value of the fisrt argument regsiter 4 (a0) to the register 5 for (i = 8; i > 0; i--) { 4000a8: 24030008 addiu$3,$0,8 set the value of the register 3 to the 8 4000ac: afbf0014 sw$31,20($29) save previous value of the return address register to the stack. char c = (val >> 28) & 0xf; 4000b0: 00051702 srl$2,$5,0x1c shift value in register 5 right by 28 bits and store result in the register 2 4000b4: 304600ff andi$6,$2,0xff abundant operation to limit value range to the character type variable and store result in the register 6 if (c < 10 ) 4000b8: 2c42000a sltiu$2,$2,10 set register 2 to one if the value is smaller than 10 c += 'A' - 10; 4000bc: 10400002 beqz$2,4000c8 <serp_send_hex+0x28>
4000c0:	24c40037 	addiu	$4,$6,55
if value is larger or equal (register 2 is 0/false) then add
value 55 ('A' - 10)..(0x41 - 0xa) = 0x37 = 55 to the register
6 and store result in the register 4. This operation is
executed even when the branch arm before else is executed,
but result is immediately overwritten by next instruction
c += '0';
4000c4:	24c40030 	addiu	$4,$6,48
add value 0x30 = 48 = '0' to the value in the register 6
and store result in the register 4 - the fisrt argument a0

serp_tx_byte(c);
4000c8:	0c100020 	jal	400080 <serp_tx_byte>
4000cc:	2463ffff 	addiu	$3,$3,-1
call subroutine to send byte to the serial port
decrement loop control variable (i) in delay-slot

for (i = 8; i > 0; i--) {
4000d0:	1460fff7 	bnez	$3,4000b0 <serp_send_hex+0x10> 4000d4: 00052900 sll$5,$5,0x4 the final condition of for loop converted to do {} while() loop. If not all 8 character send loop again. Shift left value in the register 5 by 4 bit positions. The compiler does not store values of local variables to the stack even does not store values in caller save registers (which requires to save previous values to the function stack frame). Compiler can use this optimization because it knows registers usage of called function serp_tx_byte(). } 4000d8: 8fbf0014 lw$31,20($29) 4000dc: 00000000 sll$0,$0,0x0 restore return address register value to that found at function start 4000e0: 03e00008 jr$31
4000e4:	27bd0018 	addiu	$29,$29,24