## **Computer Architectures**

## Integer Arithmetic Richard Šusta, Pavel Píša



Czech Technical University in Prague, Faculty of Electrical Engineering

Ver.1.00

### Important Introductory Note

- The goal is to understand the structure of the computer so you can make better use of its options to achieve its higher performance.
- It is also discussed interconnection of HW / SW

### • Webpages:

https://cw.fel.cvut.cz/wiki/courses/a0b36apo/start https://dcenet.felk.cvut.cz/apo/ - they will be opened

• Některé navazující předměty:

B4M35PAP - Advanced Computer Architectures

B3B38VSY - Embedded Systems

B4M38AVS - Embedded Systems Application

B4B35OSY - Operating Systems (OI)

<u>B0B35LSP – Logic Systems and Processors</u> (KyR + part of OI)

• Prerequisite: Šusta, R.: <u>APOLOS</u>, CTU-FEE 2016, 51 pg.

### Important Introductory Note

 The course is based on a world-renowned book of authors Paterson, D., Hennessey, V.: Computer Organization and Design, The HW/SW Interface. Elsevier, ISBN: 978-0-12-370606-5



# David Andrew Patterson

<u>University of California, Berkeley</u> Works: RISC processor MIPS, RAID, Clusters



### John Leroy Hennessy

10th President of <u>Stanford University</u> Works: RISC processors MIPS, DLX a MMIX



# Computers

# where they are heading...

n

### Budoucnost počítačů

Can weather forecast be predicted?

Can the development of computer technology be predicted?



### Moore's Law

**Gordon Moore**, founder of Intel, in 1965: "*The number of transistors on integrated circuits doubles approximately every two years* "



### The cost of production is growing with decreasing design rule



### Today PC Computer Base Platform – Motherboard



### **Block Diagram of Components Interconnection**



### **Block Diagram of Components Interconnection**



### **Block Diagram of Components Interconnection**





[Arnold S. Berger: Hardware Computer Organization for the Software Professional]



### 5 units:

- A processing unit that contains an arithmetic logic unit and processor registers;
- A control unit that contains an instruction register and program counter;
- Memory that stores data and instructions
- External mass storage
- Input and output mechanisms



- Android 5.0 (Lollipop)
- 2 GB RAM
- 16 GB user RAM user
- 1920 x 1080 display
- 8-core CPU (chip Exynos 5410):
  - 4 cores 1.6 GHz ARM Cortex-A15
  - 4 cores 1.2 GHz ARM Cortex-A7



#### AE0B36APO Computer Architectures

15

Source: http://www.techinsights.com/about-techinsights/overview/blog/samsung-galaxy-s4-teardown/



AE0B36APO **Computer Architectures** 

Source: http://www.techinsights.com/about-techinsights/overview/blog/samsung-galaxy-s4-teardown/

X-ray image of Exynos 5410 hip from the side :



### •We see that this is QDP (Quad die package)

To increase capacity, chips have multiple stacks of dies. A **die**, in the context of integrated circuits, is a small block of semiconducting material on which a given functional circuit is fabricated. [Wikipedia]

AE0B36APO Computer Architectures

Zdroj: http://gamma0burst.tistory.com/m/600

### Chip Exynos 5410 – here, we see DRAM



#### AE0B36APO Computer Architectures

Source: http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/documents/pages/computational-photography-part-2

### Chip Exynos 5410



- Note the different sizes of 4 cores A7 and 4 cores A15
- On the chip, other components are integrated outside the processor: the GPU, Video coder and decoder, and more. This is SoC (System on Chip)

#### AE0B36APO Computer Architectures

Source: http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/documents/pages/computational-photography-part-2, http://gamma0burst.tistory.com/m/600



### Common concept



• The processor performs stored memory (ROM, RAM) instructions to operate peripherals, to respond to external events and to process data.

### **Example of Optimization**

### Autonomous cars



Source: <u>http://www.nvidia.com/object/autonomous-cars.html</u>

• Many artificial intelligence tasks are based on deep neural networks (deep neural networks)

### Neural network passage -> matrix multiplication

- How to increase calculation?
- The results of one of many experiments
  - Naive algorithm  $(3 \times \text{for}) 3.6 \text{ s} = 0.28 \text{ FPS}$
  - Optimizing memory access 195 ms = 5.13 FPS (necessary knowledge of HW)
  - 4 cores– 114 ms = 8.77 FPS (selection of a proper synchronization)
  - GPU (256 processors) 25 ms = 40 FPS (knowledge of data transfer between CPU and coprocessors)
- Source: Naive algorithm, library Eigen (1 core), 4 cores (2 physical on i7-2520M), GPU results Joela Matějka from <a href="http://industrialinformatics.cz/">http://industrialinformatics.cz/</a>

### And Other Systems?



- Using GPUs, we process 40 fps.
- but cars have enough power for them...

 But in an embedded device, it is sometimes necessary to reduce its consumption and cost. There are used very simple processors or microcontrollers, sometimes without real number operations, and programmed with lowlevel C language.



#### There [were/are/will be] many manufactures of processors

http://research.microsoft.com/en-us/um/people/gbell/CyberMuseum\_contents/Microprocessor\_Evolution\_Poster.jpg



### Reasons to study computer architectures

- To invent/design new computer architectures
- To be able to integrate selected architecture into silicon
- To gain knowledge required to design computer hardware/systems (big ones or embedded)
- To understand generic questions about computers, architectures and performance of various architectures
- To understand how to use computer hardware efficiently (i.e. how to write good software)
  - It is not possible to efficiently use resources provided by any (especially by modern) hardware without insight into their constraints, resource limits and behavior
  - It is possible to write some well paid applications without real understanding but this requires abundant resources on the hardware level. But no interesting and demanding tasks can be solved without this understanding.

### Another motivation

### you will later work with this board







Language C still remains in the game



It is easy to see by formal-logical methods that there exist certain [instruction sets] that are in abstract adequate to control and cause the execution of any sequence of operation. The really decisive considerations from the present point of view, in selecting an [instruction set], are more of a practical nature: simplicity of the equipment demanded by the [instruction set], and **the clarity of its application to the actually important problems together with the speed of its handling of those problems**.

[Burks, Goldstine, and von Neumann, 1947]

### Computer



Reference: John Kubiatowicz: EECS 252 Graduate Computer Architecture, Lecture 1. University of California, Berkeley

### 1<sup>st</sup> lecture

- How they are stored on your computer
  - INTEGER numbers, with or without sign?
- How to perform basic operations
  - Adding, Subtracting,
  - Multiplying

### Non-positional numbers ©



http://diameter.si/sciquest/E1.htm

# 

### The value is the sum: 1 333 331

AE0B36APO Computer Architectures

### **Terminology basics**

- Positional (place-value) notation
- Decimal/radix point
- z ... base of numeral system
- smallest representable number  $\varepsilon$
- Module =  $\mathbb{Z}$ , one increment/unit higher than biggest representable number for given encoding/notation
- A, the representable number for given n and m selection, where k is natural number in range []0,z<sup>n+m+1</sup>-1]
  - The representation and value

$$A \sim a_n a_{n-1} \dots a_0, a_1 \dots a_{-m}$$
  

$$A = a_n z^n + a_{n-1} z^{n-1} + \dots + a_0 + a_1 z^{-1} \dots a_{-m} z^{-m}$$

$$\begin{bmatrix} a_n & a_{n-1} & \dots & a_0 & a_{-1} & \dots & a_{-m} \\ \hline \\ \varepsilon = z^{-m} & & \uparrow \text{ radix point} \end{bmatrix}$$

n

Λ

\_1

$$0 \leq A = k \cdot arepsilon < \mathcal{Z}$$

# **Unsigned integers**

Language C: unsigned int

#### Integer number representation (unsigned, non-negative)

- The most common numeral system base in computers is z=2
- The value of **a**<sub>i</sub> is in range {0,1,...z-1}, i.e. {0,1} for base 2
- This maps to true/false and unit of information (bit)
- We can represent number 0 ... 2<sup>n</sup>-1 when n bits are used
- Which range can be represented by one byte?

1B (byte) ... 8 bits,  $2^8 = 256_d$  combinations, values 0 ...  $255_d = 0b1111111_b$ 

- Use of multiple consecutive bytes
  - 2B ... 2<sup>16</sup> = 65536<sub>d</sub>, 0 ... 65535<sub>d</sub> = 0xFFFF<sub>h</sub> ,(h ... hexadecimal, base 16, a in range 0, ... 9, A, B, C, D, E, F)
  - 4B ...  $2^{32} = 4294967296_d$ , 0 ...  $4294967295_d = 0xFFFFFF_h$

#### Unsigned integer



#### **Unsigned 4-bit numbers**

#### Assumptions:we'll assume a 4 bit machine word



Cumbersome subtraction

# Signed numbers

Language C: int signed int

- The most frequent code
- The sum of two opposite numbers with the same absolute value is 00000000H!

| Decimal value | 4 bit two's compliment |
|---------------|------------------------|
| 6             | 0110                   |
| -6            | 1010                   |

#### Dvojkový doplněk – pokračování...

Pokud N bude počet bitů:
 <-2<sup>N-1</sup>, 2<sup>N-1</sup> -1>



| Binární hodnota  | Dvojkový<br>doplněk       |
|------------------|---------------------------|
| <b>0</b> 0000000 | 0 <sub>(10)</sub>         |
| <b>0</b> 0000001 | 1 <sub>(10)</sub>         |
|                  | •                         |
| <b>0</b> 1111101 | 125 <sub>(10)</sub>       |
| <b>0</b> 1111110 | 126 <sub>(10)</sub>       |
| <b>0</b> 1111111 | 127 <sub>(10)</sub>       |
| <b>1</b> 0000000 | -128 <sub>(10)</sub>      |
| <b>1</b> 0000001 | -127 <sub>(10)</sub>      |
| <b>1</b> 0000010 | -126 <sub>(10)</sub>      |
| :                | :                         |
| <b>1</b> 1111101 | -3 <sub>(10)</sub>        |
| <b>1</b> 1111110 | -2 <sub>(10)</sub>        |
| <b>1</b> 1111111 | <b>-1</b> <sub>(10)</sub> |

#### Two's complement - examples

- Examples:
  - 0D = 0000000H,
  - 1D = 0000001H,
  - 2D = 0000002H,
  - 3D = 0000003H,

- -1D = FFFFFFFFH,
- -2D = FFFFFFFEH,
- -3D = FFFFFFDH,

#### Number Representations

#### **Twos Complement** (In Czech: Druhý doplněk)



Only one representation for 0

One more negative number than positive number

[Seungryoul Maeng:Digital Systems]

#### Two's complement – addition and subtraction

- Addition
  - 000000 0000 0111в ≈ 7ь Symbols use: 0=0н, 0=0в
  - <u>+ 0000000 0000 0110</u> **≈ 6** р
  - 000000 0000 1101в≈ **13**D
- Subtraction can be realized as addition of negated number
  - 000000 0000 0111в ≈ 7<sub>D</sub>
  - + FFFFFF 1111 1010в≈ -6р
  - 000000 0000 0001в **≈ 1**р
- Question for revision: how to obtain negated number in two's complement binary arithmetics?

## **Other Possibilities**

#### Integer – biased representation

- Known as excess-K or offset binary as well
- Transform to the representation  $-K \dots 0 \dots 2^{n}-1-K$ D(A) = A+K
- Usually K=Z/2
- Advantages
  - Preserves order of original set in mapped set/representation
- Disadvantages
  - Needs adjustment by -K after addition and +K after subtraction processed by unsigned arithmetic unit
  - Requires full transformation before and after multiplication

#### Number Systems

#### **Excess-K**, offset binary or biased representation



One 0 representation, we can select count of negative numbers - *used e.g. for exponents of real numbers*.

Integer arithmetic unit are not designed to calculate with Excess-K numbers

[Seungryoul Maeng:Digital Systems]

#### Integer – sign-magnitude code

- Sign and magnitude of the value (absolute value)
- Natural to humans -1234, 1234
- One (usually most significant MSB) bit of the memory location is used to represent the sign
- Bit has to be mapped to meaning
- Common use 0 ≈ "+", 1 ≈ "-"
- Disadvantages:
  - When location is k bits long then only k-1 bits hold magnitude and each operation has to separate sign and magnitude
  - Two representations of the value 0





#### Sign and Magnitude Representation.



| Binary value     | Code                 |
|------------------|----------------------|
| <b>0</b> 0000000 | +0 <sub>(10)</sub>   |
| <b>0</b> 0000001 | 1 <sub>(10)</sub>    |
| :                | :                    |
| <b>0</b> 1111101 | 125 <sub>(10)</sub>  |
| <b>0</b> 1111110 | 126 <sub>(10)</sub>  |
| <b>0</b> 1111111 | 127 <sub>(10)</sub>  |
| <b>1</b> 0000000 | -0 <sub>(10)</sub>   |
| <b>1</b> 0000001 | -1 <sub>(10)</sub>   |
| <b>1</b> 0000010 | -2 <sub>(10)</sub>   |
| :                | :                    |
| <b>1</b> 1111101 | -125 <sub>(10)</sub> |
| <b>1</b> 1111110 | -126 <sub>(10)</sub> |
| <b>1</b> 1111111 | -127 <sub>(10)</sub> |
|                  | 50                   |

#### Number Systems

#### Sign and Magnitude Representation



Cumbersome addition/subtraction

Sign+Magnitude usually used only for float point numbers

#### Integers – ones' complement

• Transform to the representation

$$-2^{n-1}+1 \dots 0 \dots 2^{n-1}-1$$

 $D(A) = A \qquad \text{iff } A \ge 0$ 

D(A) = Z-1-|A| iff A<0 (i.e. subtract from all ones)

- Advantages
  - Symmetric range
  - Almost continuous, requires hot one addition when sign changes
- Disadvantage
  - Two representations of value 0
  - More complex hardware
- Negate (-A) value can be computed by bitwise complement (flipping) of each bit in representation

#### **Ones Complement**





| Binary value     | Code                 |
|------------------|----------------------|
| <b>0</b> 0000000 | 0 <sub>(10)</sub>    |
| <b>0</b> 0000001 | 1 <sub>(10)</sub>    |
| :                | :                    |
| <b>0</b> 1111101 | 125 <sub>(10)</sub>  |
| <b>0</b> 1111110 | 126 <sub>(10)</sub>  |
| <b>0</b> 1111111 | 127 <sub>(10)</sub>  |
| <b>1</b> 0000000 | -127 <sub>(10)</sub> |
| <b>1</b> 0000001 | -126 <sub>(10)</sub> |
| <b>1</b> 0000010 | -125 <sub>(10)</sub> |
| :                | :                    |
| <b>1</b> 1111101 | -2 <sub>(10)</sub>   |
| <b>1</b> 1111110 | -1 <sub>(10)</sub>   |
| <b>1</b> 1111111 | -0 <sub>(10)</sub>   |

#### Number Systems

#### **Ones Complement** (In Czech: První doplněk)



Still two representations of 0! This causes some problems Some complexities in addition, nowadays nearly not used



## **OPERATION WITH INTEGERS**

AE0B36APO Computer Architectures

#### Direct realization of adder as logical function

|           | Number of logic operations |
|-----------|----------------------------|
| bit width | for calculating sum        |
| 1         | 3                          |
| 2         | 22                         |
| 3         | 89                         |
| 4         | 272                        |
| 5         | 727                        |
| 6         | 1567                       |
| 7         | 3287                       |
| 8         | 7127                       |
| 9         | 17623                      |
| 10        | 53465                      |
| 11        | 115933                     |

Complexity is higher than O(2<sup>n</sup>)

The calculation was performed by BOOM logic minimizer created at the Department of Computer Science CTU-FEE

#### 1bit Full Adder

| Α            | 0  | 0  | 1  | 1  | 0  | 0  | 1  | 1  |
|--------------|----|----|----|----|----|----|----|----|
| +B           | 0  | 1  | 0  | 1  | 0  | 1  | 0  | 1  |
| Sum          | 00 | 01 | 01 |    | 00 | 01 | 01 | 10 |
| + Carry-In   | 0  | 0  | 0  | 0  | 1  | 1  | 1  | 1  |
| CarryOut Sum | 00 | 01 | 01 | 10 | 01 | 10 | 10 | 11 |



#### Adder



#### Simple Adder



### Simplest N-bit adder

we chain 1-bit full adders

"Carry" ripple through their chain

Minimal number of logical elements

Delay is given by the last Cout - 2\*(N-1)+ 3 gates of the last adder

= (2 N+1) times propagation delay of 1 gate

32bit CLA "carry look-ahead" adder The carry-lookahead adder calculates one or more carry bits before the sum, which reduces the wait time to calculate the result of the larger value bits





#### **Increment / Decrement**

Very fast operations that do not need an adder!

The last bit is always negated, and the previous ones are negated according to the end 1 / 0

| Dec. | Binary<br>8 4 2 1  | +1                 | Binary<br>8 4 2 1  | -1                 |
|------|--------------------|--------------------|--------------------|--------------------|
| 0    | 0000               | 0001               | 0000               | 1111               |
| 1    | 0001               | 0010               | 0001               | 0000               |
| 2    | 0010               | 001 <mark>1</mark> | 001 <mark>0</mark> | 00 <mark>01</mark> |
| 3    | 0011               | 0100               | 0011               | 001 <mark>0</mark> |
| 4    | 0100               | 010 <mark>1</mark> | 0100               | 0011               |
| 5    | 010 <mark>1</mark> | 0110               | 0101               | 010 <mark>0</mark> |
| 6    | 0110               | 011 <mark>1</mark> | 0110               | 01 <mark>01</mark> |
| 7    | 0111               | 1000               | 0111               | 011 <mark>0</mark> |
| 8    | 1000               | 100 <mark>1</mark> | 1000               | 0111               |
| 9    | 100 <mark>1</mark> | 10 <mark>10</mark> | 1001               | 100 <mark>0</mark> |
| 10   | 1010               | 101 <mark>1</mark> | 101 <mark>0</mark> | 10 <mark>01</mark> |
| 11   | 10 <mark>11</mark> | 1100               | 1011               | 101 <mark>0</mark> |
| 12   | 1100               | 110 <mark>1</mark> | 11 <mark>00</mark> | 1011               |
| 13   | 110 <mark>1</mark> | 11 <mark>10</mark> | 1101               | 110 <mark>0</mark> |
| 14   | 1110               | 111 <mark>1</mark> | 111 <mark>0</mark> | 11 <mark>01</mark> |
| 15   | 1111               | 0000               | 1111               | 111 <mark>0</mark> |

#### Special Case +1/-1

S0=not A0 S1=A1 xor A0S2=A2 xor (A1 and A0)Eq:  $S_i = A_i$  xor  $(A_{i-1} \text{ and } A_{i-2} \text{ and } \dots A_1 \text{ and } A_0)$ ; i=0...n-1 S0=not A0 ∴ <mark>}+</mark>→S S1=A1 xor (not A0)S2=A2 xor (not A1 and not A0)

Eq:  $S_i = A_i$  xor (not  $A_{i-1}$  and ... and not  $A_0$ ); i=0...n-1

The number of circuits is given by the arithmetic series, with the complexity O ( $n^2$ ) where n is the number of bits. The operation can be performed in parallel for all bits, and for the both +1/-1 operations, we use a circuit that differs only by negations.

#### Addition / Subtraction HW



Source: X36JPO, A. Pluháček

AE0B36APO Computer Architectures

#### Unsigned binary numbers multiplication

|              |              |              |              |              | ļ            | 4            |              |    |              | E            | 3            |              |
|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|----|--------------|--------------|--------------|--------------|
|              |              |              |              | 1            | 1            | 0            | 1            | •  | 1            | 0            | 1            | 1            |
|              |              |              |              |              |              |              |              |    | $\downarrow$ | $\downarrow$ | $\downarrow$ | $\downarrow$ |
|              |              |              |              | 0            | 0            | 0            | 0            |    |              |              |              |              |
|              |              |              |              | 1            | 1            | 0            | 1            |    |              | de la        | 1            | 1            |
|              |              |              | 0            | 1            | 1            | 0            | 1            |    |              |              |              |              |
|              |              |              | 1            | 1            | 0            | 1            | _            |    | P            | 1            | 1            |              |
|              |              | 1            | 0            | 0            | 1            | 1            |              |    |              |              |              |              |
|              |              | 0            | 0            | 0            | 0            | _            |              | P  | £1           | 0            |              |              |
|              | 0            | 1            | 0            | 0            | 1            |              |              |    |              |              |              |              |
|              | 1            | 1            | 0            | 1            |              |              | P            | £1 | 1            |              |              |              |
| 1            | 0            | 0            | 0            | 1            | •            |              |              |    |              |              |              |              |
| $\downarrow$ |    |              |              |              |              |
| 1            | 0            | 0            | 0            | 1            | 1            | 1            | 1            |    |              |              |              |              |
| <u> </u>     | c            | 1            |              | _            | C            | 0            |              |    |              |              |              |              |

#### Sequential hardware multiplier (32b case)



The speed of the multiplier is horrible

#### Algorithm for Multiplication

```
A = multiplicand;
MQ = multiplier;
AC = 0;
for(int i=1; i <= n; i++) // n – represents number of bits
{
   if (MQ_0 = = 1) AC = AC + A; // MQ_0 = LSB of MQ
   SR (shift AC MQ by one bit right and insert information about
   carry from the MSB from previous step)
end.
```

when loop ends AC MQ holds 64-bit result

Example of the multiply X by Y

Multiplicand x=110 and multiplier y=101.

| i | operation    | AC  | MQ  | A   | comment                 |
|---|--------------|-----|-----|-----|-------------------------|
|   |              | 000 | 101 | 110 | initial setup           |
| 1 | AC = AC + MB | 110 | 101 |     | start of the cycle      |
|   | SR           | 011 | 010 |     |                         |
| 2 | nothing      | 011 | 010 |     | because of $MQ_0 = = 0$ |
|   | SR           | 001 | 101 |     |                         |
| 3 | AC = AC + MB | 111 | 101 |     |                         |
|   | SR           | 011 | 110 |     | end of the cycle        |

The whole operation:  $x \times y = 110 \times 101 = 011110$ , ( $6 \times 5 = 30$ )

## Multiplication in two's compliment

Can be implemented, but there is a problem ...

# The intended product is generally not the same as the product of two's numbers!

Details are already outside the intended APO range.

The best way is the multiplication of their absolute values and decision about its sign.

#### Wallace tree based multiplier

Q=X .Y, X and Y are considered as and 8bit unsigned numbers ( $x_7 x_6 x_5 x_4 x_3 x_2 x_1 x_0$ ). ( $y_7 y_6 y_5 y_4 y_3 y_2 y_1 y_0$ ) =

| 0               | 0                                                                                         | 0               | 0                             | 0               | 0                             | 0                             | 0                             | $x_7y_0$              | $x_6y_0$                      | $x_5y_0$              | $x_4y_0$ | $x_3y_0$              | $x_2y_0$              | $x_1y_0$ | $x_0y_0$ | P0 |
|-----------------|-------------------------------------------------------------------------------------------|-----------------|-------------------------------|-----------------|-------------------------------|-------------------------------|-------------------------------|-----------------------|-------------------------------|-----------------------|----------|-----------------------|-----------------------|----------|----------|----|
| 0               | 0                                                                                         | 0               | 0                             | 0               | 0                             | 0                             | $x_7y_1$                      | $x_6y_1$              | $x_5y_1$                      | $x_4y_1$              | $x_3y_1$ | $x_2y_1$              | $x_1y_1$              | $x_0y_1$ | 0        | P1 |
| 0               | 0                                                                                         | 0               | 0                             | 0               | 0                             | $x_7y_2$                      | $x_6y_2$                      | $x_5y_2$              | $x_4y_2$                      | $x_3y_2$              | $x_2y_2$ | $x_1y_2$              | $x_0y_2$              | 0        | 0        | P2 |
| 0               | 0                                                                                         | 0               | 0                             | 0               | $x_7y_3$                      | $x_6y_3$                      | $x_5y_3$                      | $x_4y_3$              | x <sub>3</sub> y <sub>3</sub> | $x_2y_3$              | $x_1y_3$ | $x_0y_3$              | 0                     | 0        | 0        | P3 |
| 0               | 0                                                                                         | 0               | 0                             | $x_7y_4$        | $x_6y_4$                      | $x_5y_4$                      | $x_4y_4$                      | $x_3y_4$              | $x_2y_4$                      | $x_1y_4$              | $x_0y_4$ | 0                     | 0                     | 0        | 0        | P4 |
| 0               | 0                                                                                         | 0               | $x_7y_5$                      | $x_6y_5$        | x <sub>5</sub> y <sub>5</sub> | $x_4y_5$                      | x <sub>3</sub> y <sub>5</sub> | $x_2y_5$              | $x_1y_5$                      | $x_0y_5$              | 0        | 0                     | 0                     | 0        | 0        | P5 |
| 0               | 0                                                                                         | $x_7y_6$        | x <sub>6</sub> y <sub>6</sub> | $x_5y_6$        | $x_4y_6$                      | x <sub>3</sub> y <sub>6</sub> | $x_2y_6$                      | $x_1y_6$              | $x_0y_6$                      | 0                     | 0        | 0                     | 0                     | 0        | 0        | P6 |
| 0               | $x_7y_7$                                                                                  | $x_6y_7$        | x <sub>5</sub> y <sub>7</sub> | $x_4y_7$        | x <sub>3</sub> y <sub>7</sub> | $x_2y_7$                      | $x_1y_7$                      | $x_0y_7$              | 0                             | 0                     | 0        | 0                     | 0                     | 0        | 0        | P7 |
| Q <sub>15</sub> | Q <sub>14</sub>                                                                           | Q <sub>13</sub> | Q <sub>12</sub>               | Q <sub>11</sub> | <b>Q</b> <sub>10</sub>        | <b>Q</b> <sub>9</sub>         | $Q_8$                         | <b>Q</b> <sub>7</sub> | $Q_6$                         | <b>Q</b> <sub>5</sub> | $Q_4$    | <b>Q</b> <sub>3</sub> | <b>Q</b> <sub>2</sub> | $Q_1$    | $Q_0$    |    |
|                 | The sum of P0+P1++P7 gives result of X and Y multiplication.<br>Q = X .Y = P0 + P1 + + P7 |                 |                               |                 |                               |                               |                               |                       |                               |                       |          |                       |                       |          |          |    |

#### Parallel adder of 9 numbers



#### **Decadic Carry-save adder**



#### 1bit Carry Save Adder

| A          | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
|------------|---|---|---|---|---|---|---|---|
| +B         | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| Z=Carry-In | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| Sum        | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| C=Cout     | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 |





#### 3-bit Carry-save adder



#### Wallace tree based fast multiplier

The basic element is an CSA circuit (Carry Save Adder)



AE0B36APO Computer Architectures