0% found this document useful (0 votes)
220 views

Comp Arc A

1) Arithmetic operations like addition and subtraction are performed at the machine instruction level in the processor's arithmetic logic unit (ALU). 2) There are three main systems for representing positive and negative numbers - sign-magnitude, 1's complement, and 2's complement. 3) Addition of binary numbers is performed by adding bit pairs from least to most significant bit and propagating carries, requiring cascaded adder circuits for multi-bit numbers.

Uploaded by

missymichi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
220 views

Comp Arc A

1) Arithmetic operations like addition and subtraction are performed at the machine instruction level in the processor's arithmetic logic unit (ALU). 2) There are three main systems for representing positive and negative numbers - sign-magnitude, 1's complement, and 2's complement. 3) Addition of binary numbers is performed by adding bit pairs from least to most significant bit and propagating carries, requiring cascaded adder circuits for multi-bit numbers.

Uploaded by

missymichi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

ARITHMETIC

INTRODUCTION

 A basic operation in all digital computers is the


addition or subtraction of two numbers.

 Arithmetic operations occur at the machine instruction


level.

 They are implemented, along with basic logic


functions such as AND, OR, NOT, and EXCLUSIVE-
OR, in the arithmetic and logic unit (ALU) subsystem
of the processor.

 The time needed to perform an addition operation


affects the processor’s performance.

 Multiply and divide operations, which require more


complex circuitry than either addition or subtraction
operations, also affect performance.

 Compared with arithmetic operations, logic operations


are simple to implement using combinational
circuitry. They require only independent Boolean
operations on individual bit positions on the operands,
whereas carry/borrow lateral signals are required in
arithmetic operations.

Arithmetic 1
REVIEW OF NUMBER REPRESENTATIONS

 Three systems are widely used for representing both


positive and negative numbers:

1. Sign and Magnitude

2. 1’s – complement

3. 2’s – complement

 For all three systems, the leftmost bit is 0 for positive


numbers and 1 is for negative numbers

 Binary, Signed Integer Representations

b3 b2 b1 b0 Sign and 1’s Complement 2’s Complement


Magnitude
0000 +0 +0 +0
0001 +1 +1 +1
0010 +2 +2 +2
0011 +3 +3 +3
0100 +4 +4 +4
0101 +5 +5 +5
0110 +6 +6 +6
0111 +7 +7 +7
1000 -0 -7 -8
1001 -1 -6 -7
1010 -2 -5 -6
1011 -3 -4 -5
1100 -4 -3 -4
1101 -5 -2 -3
1110 -6 -1 -2
1111 -7 -0 -1

Arithmetic 2
 Positive values have identical representations in all
systems.

 In the sign-and-magnitude system, negative values are


represented by changing the MSB of the positive
representation to 1.

Example:

+5 = 0101

-5 = 1101

 In the 1’s-complement system, negative values are


obtained by complementing each bit of the
corresponding positive representation.

Example:

+3 = 0011

-3 = 1100

Clearly, the same operation, bit complementing, is


done in converting a negative number to the
corresponding positive value.

Arithmetic 3
 In the 2’s-complement system, a negative number is
obtained by subtracting the positive number from 2n.
Hence, the 2’s-complement representation is obtained
by adding 1 to the 1’s-complement representation.

Example:

+3 = 0011

-3 = 1101

 Note that there are two distinct representations for +0


and -0 in both sign-and-magnitude and 1’s-
complement, but the 2’s-complement has only one
representation for 0. For 4-bit numbers, the value -8
is representable in the 2’s-complement system but not
in the others.

 Among the three systems, the 2’s-complement system


yields the most efficient logic circuit implementation,
and the one most often used in computers, for addition
and subtraction operations. It is the one most often
used in computers.

Arithmetic 4
ADDITION OF POSITIVE NUMBERS

 Addition of 1-bit numbers

0 1 0 1
+ 0 + 0 + 1 + 1
0 1 1 10

Carry-out

The sum of 1 and 1 requires the 2-bit vector 1 0 to


represent the value 2.

 In order to add multiple-bit numbers, bit pairs are


added starting from the low-order (right) end of the bit
vectors, propagating carries toward the high-order
(left) end.

Example:

X 7 0 1 1 1
+ Y + 6 + 0 0 1 1 1 1 0 0 0

Z 13 1 1 0 1

xi
Carry-out yi Carry-in
ci+1 si ci

Legend for stage i

Arithmetic 5
 The truth table for the sum and carry-out functions for
adding two equally weighted bits xi and yi:

xi yi ci si ci+1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

 The logic equation for the sum bit si is given by:

si = xi yi ci + xi yi ci + xi yi ci + xi yi ci

 The logic equation for the carry-out bit ci+1 is given


by:

ci+1 = yi ci + xi ci + xi yi

Arithmetic 6
 A straightforward, 2-level, combinational logic circuit
implementation of the truth table for addition is shown
below:

xi
yi
yi
ci
ci
xi
yi
ci xi
si ci+1
xi ci
yi
ci
xi
xi
yi yi
ci

 Simplified block representation of an adder


(sometimes called a full-adder)

xi yi

Adder
ci+1 ci
(A)

si

Arithmetic 7
 A cascaded connection of n adder blocks can be used
to add two n-bit numbers.

xn-1 yn-1 x1 y1 x0 y0

cn-1 c1
Adder Adder Adder
cn c0
(A) (A) (A)

sn-1 s1 s0

MS B Position LS B Position

Since the carries must propagate, or ripple, through


this cascade, the configuration is called an n-bit
ripple-carry adder.

 A cascade of k n-adders:

xkn-1ykn-1 x2n-1 y2n-1 xn yn xn-1 yn-1 x0 y0


... ... ...
ckn n-bit n-bit cn n-bit
c0
adder adder adder
... ... ...

skn-1 s(k-1)n sn-1 sn sn-1 s0

Arithmetic 8
DESIGN OF FAST ADDERS

 The n-bit ripple-carry adder may have too much delay


in developing its outputs, s0 through sn-1 and cn.

 Suppose that the delay from ci to ci+1 of any Adder


block is 1 ns (assuming a gate delay of 0.5 ns). An n-
bit addition can be performed in the time it takes the
carry signal to reach the cn-1 position plus the time it
takes to develop sn-1. Assuming the last delay is 1.5
ns, a 32-bit addition takes (31  1 ns) + 1.5 ns = 32.5
ns.

 Two approaches can be taken to reduce this delay to


the desired 10-ns range. The first approach is to use
faster electronic circuit technology in implementing
the ripple-carry logic design. The second approach is
to use a different logic network structure.

Logic structures for fast adder design must speed up


the generation of the carry signals. The logic
expressions for si (sum) and ci+1 (carry-out) of stage i
are:

si = xi yi ci + xi yi ci + xi yi ci + xi yi ci

ci+1 = xi yi + yi ci + xi ci

Arithmetic 9
By doing the following:

ci+1 = xi yi + (yi + xi) ci

Let Gi = xi yi and Pi = xi + yi

ci+1 = Gi + Pi ci

The expressions Gi and Pi are called the generate and


propagate functions for stage i. Take note that both
functions are dependent only on the X and Y inputs
and not on any carry input.

Expanding ci in terms of i–1 subscripted variables:

ci = xi-1 yi-1 + (yi-1 + xi-1 ) ci-1

ci = Gi-1 + Pi-1 ci-1

Substituting into the ci+1 equation:

ci+1 = Gi + Pi ci

ci+1 = Gi + Pi [Gi-1 + Pi-1 ci-1]

ci+1 = Gi + Pi Gi-1 + Pi Pi-1 ci-1

Arithmetic 10
Continuing this type of expansion, the final expression
for any carry variable is:

ci+1 = Gi + PiGi-1 + PiPi-1Gi-2 + … +

PiPi-i…P1G0 + PiPi-1…P0c0

Thus all carries can be obtained three-logic delays


after the input operands X, Y, and c0 are available,
because only one gate delay is needed to develop all
Pi and Gi signals, followed by two gate delays in the
AND-OR circuit for ci+1.

After another three gate delays (one delay to invert


each carry and two further delays to form each sum bit
as shown earlier), all sum bits are available.

Therefore, independent of n, the n-bit addition process


requires only six levels of logic.

Arithmetic 11
Example:

Assume a 4-bit adder:

c1 = G0 + P0c0

c2 = G1 + P1G0 + P1P0c0

c3 = G2 + P2G1 + P2P1G0 + P2P1P0c0

c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0

It will take 1 gate delay (0.5 ns) to produce all the


generate and propagate functions after the availability
of the X and Y inputs.

After the generate and propagate functions become


available, it will take two gate delays (1 ns) to
produce all the carry signals (c1, c2, c3, c4).

After all the carry signals become available, it will


take three gate delays (1.5 ns) to produce all the sum
bits.

Therefore the total time for a 4-bit addition will take


only 3 ns. It also takes the same time to perform any
n-bit addition.

Arithmetic 12
Fast adders that form carry functions are called carry-
lookahead adders.

A practical problem with the carry-lookahead


approach is the gate fan-in constraints. The
expression for ci+1 requires i + 2 inputs to the largest
AND term and i + 2 inputs to the OR term.

Consider the addition of two 8-bit numbers. For the


ripple-carry adder, the total time for an 8-bit addition
is (7  1 ns) + 1.5 ns = 8.5 ns. For the carry-
lookahead adder, the total time is 3 ns. However, the
carry-lookahead adder requires a fan-in of nine for the
basic gates.

A solution to this problem is to implement the carry-


lookahead technique on a per block basis.

Arithmetic 13
For the 8-bit addition:

x7 y7 x6 y6 x5 y5 x4 y4 x3 y3 x2 y2 x1 y1 x0 y0

c8 c4 c0
4-bit Adder 4-bit Adder

s7 s6 s5 s4 s3 s2 s1 s0

The carries inside each 4-bit adder block are formed


by lookahead circuits. However, they still ripple
between blocks.

Because of the lookahead circuits in the rightmost


adder, c4 will be formed after 3 gate delays (1.5 ns).

After c4 becomes available, the leftmost block will


produce the remaining part of the sum in 2.5 ns
(because of the lookahead circuits). Therefore, the
total time for an 8-bit addition is 4.0 ns.

Arithmetic 14
SIGNED ADDITION AND SUBTRACTION

 Out of the three methods of representing signed


numbers, the 2’s complement representation is the
best method in terms of efficiently implementing
addition and subtraction.

 The rules for governing the addition and subtraction


of n-bit signed numbers using the 2’s complement
representation system are:

1. To add two numbers, add their representations in


an n-bit adder, ignoring the carry-out signal from
the MSB position. The sum will be the
algebraically correct value in 2’s complement
representation as long as the answer is in the
range –2n-1 through +2n-1 – 1.

2. To subtract two numbers X and Y, that is to


perform X – Y, form the 2’s complement of Y and
then add it to X, as in rule 1. Again, the result
will be the algebraically correct value in 2’s
complement representation as long as the answer
is in the range –2n-1 through +2n-1 – 1.

Arithmetic 15
Examples:

0010 (+2) 0100 (+4)


+ 0011 + (+3) + 1010 + (-6)
0101 (+5) 1110 (-2)

1011 (-5) 0111 (+7)


+ 1110 + (-2) + 1101 + (-3)
1001 (-7) 0100 (+4)

0010 (+2) 0010


- 0100 - (+4) + 1100
1110 (-2)

0110 (+6) 0110


- 0011 - (+3) + 1101
0011 (+3)

1001 (-7) 1001


- 1011 - (-5) + 0101
1110 (-2)

1001 (-7) 1001


- 0001 - (+1) + 1111
1000 (-8)

0010 (+2) 0010


- 1101 - (-3) + 0011
0101 (+5)

1101 (-3) 1101


- 1001 - (-7) + 0111 +
0100 (+4)

Arithmetic 16
 Binary addition-subtraction logic network:

yn-1 y1 y0

. . . Add/S ub
Control

. . .
xn-1 x1 x0

. . .

c0
cn
n-bit adder

x0

. . .

sn-1 s1 s0

The Add/Sub control wire is set to 0 for addition.


This allows the Y vector to be applied unchanged to
one of the adder inputs along with the carry-in signal,
c0, of 0.

When the Add/Sub control wire is set to 1 for


subtraction, the Y vector is 1’s complemented (that is,
the bit is complemented) by the EX-OR gates, and c0
is set to 1 to complete the 2’s complementation of Y.

Arithmetic 17
 When the result does not fall within the representable
range, then an arithmetic overflow has occurred.

Take note that overflow can occur only when adding


two numbers that have the same sign. When both
operands X and Y have the same sign, an overflow
occurs when the sign of S does not agree with the
signs of X and Y. In an n-bit adder, the overflow
signal is defined by the logical expression:

Overflow = xn-1 yn-1 sn-1 + xn-1 yn-1 sn-1

A computer must be able to detect an overflow. It is


customary to dedicate a condition code flag as the
indicator, and it is possible to have this flag cause an
interrupt when an add or Subtract instruction results in
an overflow. Then the programmer must decide how
to correct the problem.

Another overflow indicator:

There is an overflow when there is a carry into


the MSB and no carry out of the MSB or vice-
versa. For subtraction, it is set to 1, when the
MSB needs a borrow and there is no borrow from
the MSB, or vice-versa.

Arithmetic 18
MULTIPLICATION OF POSITIVE NUMBERS

 The usual algorithm for multiplying integers by hand


for the binary system:

1 1 0 1 (13) Multiplicand M
 1 0 1 1 (11) Multiplier Q
1101
1101
0000
1101
1 0 0 0 1 1 1 1 (143) Product P

 This algorithm applies to unsigned numbers and to


positive numbers.

 The product of two n-digit numbers can be


accommodated in 2n digits, so the product of two 4-bit
numbers fits into 8 bits.

 In the binary system, multiplication of the


multiplicand by one bit of the multiplier is easy.

If the multiplier bit is 1, the multiplicand is entered in


the appropriate position to be added to the partial
product. If the multiplier is 0, then 0s are entered.

 In early computers, because of logic costs, the adder


circuitry in the ALU was used to perform
multiplication sequentially.

Arithmetic 19
 Block diagram for the sequential circuit binary
multiplier:
Register A (initially 0)

C an-1 ... a0 qn-1 ... q0

Multiplier Q
Add/Noadd
Control

n-bit
adder
Control
MPX
S equencer

0
0

mn-1 ... m0

Multiplicand M

1 1 0 1
Initial
0 0 0 0 0 1 0 1 1

C A Q

0 1 1 0 1 1 0 1 1 Add 1st
0 0 1 1 0 1 1 0 1 S hift Cycle

1 0 0 1 1 1 1 0 1 Add 2nd
0 1 0 0 1 1 1 1 0 S hift Cycle

0 1 0 0 1 1 1 1 0 No Add 3rd
0 0 1 0 0 1 1 1 1 S hift Cycle

1 0 0 0 1 1 1 1 1 Add 4th
0 1 0 0 0 1 1 1 1 S hift Cycle

Product

Arithmetic 20
The sequential multiplier circuit performs
multiplication by using a single adder n times.

Registers A and Q combined hold PPi while multiplier


bit qi generates the signal Add/Noadd. This signal
controls the addition of the multiplicand, M, to PPi to
generate PP(i + 1).

The product is computed in n cycles.

The partial product grows in length by 1 bit per cycle


from the initial vector PP0, of n 0s in register A.

At the start, the multiplier is loaded into register Q,


the multiplicand into register M, and C and A are
cleared to 0. At the end of each cycle, C, A, and Q are
shifted right one bit position to allow for growth of the
partial product as the multiplier is shifted out of
register Q.

Because of this shifting, multiplier bit qi appears at the


LSB position of Q to generate the Add/Noadd signal
at the correct time. After they are used, the multiplier
bits are discarded by the right-shift operation.

After n cycles, the high-order half of the product is


held in register A and the low-order half is in register
Q.

Arithmetic 21
SIGNED-OPERAND MULTIPLICATION

 Consider first the case of a positive multiplier and a


negative multiplicand. In adding a negative
multiplicand to a partial product, it is necessary to
extend the sign-bit value of the multiplicand to the left
as far as the product will extend.

Example:

1 0 0 1 1 (-13)
 0 1 0 1 1 (+11)
1 1 1 1 1 1 0 0 1 1
1 1 1 1 1 0 0 1 1
0 0 0 0 0 0 0 0
1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 0 1 1 1 0 0 0 1 (-143)

 For a negative multiplier, a straightforward solution is


to form the 2’s complement of both the multiplier and
the multiplicand and proceed as in the case of a
positive multiplier.

This is possible because complementation of both


operands does not change the value or the sign of the
product.

Arithmetic 22
 Booth Algorithm

A powerful algorithm for signed-number


multiplication, the Booth algorithm generates a 2n-bit
product and treats both positive and negative numbers
uniformly.

Consider a multiplication operation in which the


multiplier is positive and has a single block of 1s, such
as 0011110. To derive the product, one can add four
appropriately shifted versions of the multiplicand, as
in the standard procedure.

However, the number of required operations can be


reduced by regarding the multiplier as the difference
between two numbers:

0100000 (32)
- 0000010 (2)
0011110 (30)

This suggests that the product can be generated by


adding 25 times the multiplicand to the 2’s
complement of 21 times the multiplicand.

For convenience, the sequence of required operations


can be described by recording the preceding multiplier
as 0 +1 0 0 0 –1 0.

Arithmetic 23
Normal multiplication algorithm:

0101101
0011110
0000000
0101101
0101101
0101101
0101101
0000000
0000000
00010101000110

Booth multiplication algorithm:

0101101
0+10 0 0 -10
00000000000000
1111111010011
000000000000
00000000000
0000000000
000101101
00000000
00010101000110

Arithmetic 24
In general, in the Booth scheme, -1 times the shifted
multiplicand is selected when moving from 0 to 1, and
+1 times the shifted multiplicand is selected when
moving from 1 to 0, as the multiplier is scanned from
right to left.

Multiplier Version of
Bit i Bit i – 1 multiplicand selected
by bit i
0 0 0xM
0 1 +1 x M
1 0 -1xM
1 1 0xM

The Booth algorithm clearly extends to any number of


1s in a multiplier, including the situation in which a
single 1 is considered a block.

Example:

0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0


0 +1-1 +1 0 -1 0 +1 0 0 -1 +1-1 +1 0 -1 0 0

Arithmetic 25
The Booth algorithm can also be used for negative
numbers.

01101 (+13)
x 11010 (-6)

01101
0-1+1-10
0000000000
111110011
00001101
1110011
000000
1 1 1 0 1 1 0 0 1 0 (-78)

The transformation 011…110  +100…0-10 is called


skipping over 1s. This term is derived from the case
in which the multiplier has its 1s grouped into a few
contiguous blocks; only a few versions of the
multiplicand, that is, the summands, must be added to
generate the product, thus speeding up the
multiplication operation.

Arithmetic 26
However, in the worst case – that of alternating 1s and
0s in the multiplier – each bit of the multiplier selects
a summand. In fact, this results in more summands
than if the Booth algorithm were not used.

Examples:

Worst-case Multiplier:

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

+1-1+1-1 +1-1 +1 -1+1-1 +1-1+1 -1 +1-1

Ordinary Multiplier:

1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0

0 -1 0 0 +1-1+1 0 -1 +1 0 0 0 -1 0 0

Good Multiplier:

0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1

0 0 0 0 +1 0 0 0 0 -1 0 0 0 +1 0 -1

Arithmetic 27
The Booth algorithm has two attractive features:

1. It handles both positive and negative multipliers


uniformly.

2. It achieves some efficiency in the number of


additions required when the multiplier has a few
large blocks of 1s.

The speed gained by skipping over 1s depends on


the data.

On the average, the speed of doing multiplication


with the Booth algorithm is the same as with the
normal algorithm.

Arithmetic 28
FAST MULTIPLICATION

 A multiplication speedup technique will now be


discussed that guarantees at most n/2 summands and
that uniformly handles the signed-operand case. This
is twice as fast as the worst-case Booth algorithm
situation.

 Recall that the Booth technique multiplier qi selects a


summand as a function of the bit qi-1 on its right. The
summand selected is shifted i binary positions to the
left of the LSB position of the product before the
summand is added. Let this summand position as i
(SPi).

 The basic idea of the speedup technique is to use the


bits i + 1and i to select a summand, as a function of bit
i – 1, to be added at SPi. The n/2 summands are thus
selected by bit pairs (x1, x0), (x3, x2), (x5, x4), and so on.
The technique is called the bit-pair recoding method.

Arithmetic 29
Example:

sign extension implied 0 to the


right of LS B
1 1 1 0 1 0 0

0 0 -1 +1 -1 0

0 -1 -2

Consider the multiplier 11010. Its Booth


representation is given as 0 –1 +1 –1 0. By
grouping the Booth-recoded selectors, one can
obtain a single, appropriately shifted summand
for each pair.

The rightmost Booth pair, (-1, 0), is equivalent to


–2 x (multiplicand M) at SP0. The next pair, (-1,
+1), is equivalent to –1 x M at SP2; finally, the
left most pair of zeros is equivalent to 0 x M at
SP4.

Restating these selections in terms of the original


multiplier bits, the case of (1,0) with 0 on the
right selects –2 x M; the case of (1,0) with 1 on
the right selects –1 x M; and the case of (1,1)
with 1 on the right selects 0 x M.

Arithmetic 30
 The complete set of eight cases is shown below:

Multiplier Bit Pair Multiplier Bit on the Right Multiplicand


i+1 i i-1 Selected at SPi
0 0 0 0xM
0 0 1 +1 x M
0 1 0 +1 x M
0 1 1 +2 x M
1 0 0 -2 x M
1 0 1 -1 x M
1 1 0 -1 x M
1 1 1 0xM

Examples:

0 1 1 0 1

+1 -1 +1

1 0 1 1 1

+1 +2 +1

Arithmetic 31
 Multiplication Examples:

01101 (+13)
x 11010 (- 6)

01101
0 -1 -2
1111100110
11110011
000000
1 1 1 0 1 1 0 0 1 0 (- 78)

01010 (+10)
x 10111 (- 9)

01101
- 1 +2 - 1
1111110110
00010100
110110
1 1 1 0 1 0 0 1 1 0 (- 90)

Arithmetic 32
INTEGER DIVISION

 Longhand division examples:

21 10101
13 274 1101 100010010
26 1101
14 10000
13 1101
1 1110
1101
1

 A circuit that implements division by this longhand


method operates as follows:

It positions the divisor appropriately with respect


to the dividend and performs a subtraction. If the
remainder is zero or positive, a quotient bit is
determined, the remainder is extended by another
bit of the dividend, the divisor is repositioned,
and another subtraction is performed.

On the other hand, if the remainder is negative, a


quotient bit of 0 is determined, the dividend is
restored by adding back the divisor, and the
divisor is repositioned for another subtraction.

Arithmetic 33
 The logic circuit arrangement that implements this
restoring-division technique is shown below:

Register A
Shift Left

an an-1 ... a0 qn-1 ... q0

Dividend Q
Quotient
Setting

n+1-bit Add/Subtract
adder
Control
Sequencer

0 mn-1 ... m0

Divisor M

Arithmetic 34
An n-bit divisor is loaded into register M and an n-bit
positive dividend is loaded into register Q at the start
of the operation. Register A is set to 0.

After the division is complete, the n-bit quotient is in


register Q and the remainder is in register A. The
required subtractions are facilitated by using 2’s
complement arithmetic. The extra bit position at the
left end of both A and M accommodates the sign bit
during subtractions.

 The algorithm that follows performs the division:

Do the following n times:

Shift A and Q left one binary position.

Subtract M from A, and place the answer


back in A.

If the sign of A is 1, set q0 to 0 and add M


back to A (that is, restore A); otherwise, set
q0 to 1.

Arithmetic 35
 A restoring-division example:

10
11 1000
11
10

Initially 0 0 0 0 0 1 0 0 0
0 0 0 1 1
S hift 0 0 0 0 1 0 0 0 0
S ubtract 1 1 1 0 1 1st
Cycle
S et q0 1 1 1 1 0
Restore 1 1
0 0 0 0 1 0 0 0 0
S hift 0 0 0 1 0 0 0 0 0
S ubtract 1 1 1 0 1
2nd
S et q0 1 1 1 1 1 Cycle
Restore 1 1
0 0 0 1 0 0 0 0 0
S hift 0 0 1 0 0 0 0 0 0
S ubtract 1 1 1 0 1
3rd
S et q0 0 0 0 0 1 Cycle

0 0 0 1
S hift 0 0 0 1 0 0 0 1 0
S ubtract 1 1 1 0 1
4th
S et q0 1 1 1 1 1 Cycle
Restore 1 1
0 0 0 1 0 0 0 1 0

Remainder Quotient

Arithmetic 36
 This algorithm can be improved by avoiding the need
for restoring A after an unsuccessful subtraction
(subtraction is said to be unsuccessful if the result is
negative).

Consider the sequence of operations that takes place


after the subtraction operation in the preceding
algorithm. If A is positive, we shift left and subtract
M; that is, we perform 2A – M. If A is negative, we
restore it by performing A + M. and then we shift it
left and subtract M. This is equivalent to performing
2A + M. The q0 bit is appropriately set to 0 or 1 after
the correct operation has been performed. All of these
can be summarized in the following nonrestoring-
division algorithm:

Step 1: Do the following n times:

If the sign of A is 0, shift A and Q one bit


position and subtract M from A;
otherwise shift A and Q left and add M
to A.

If the sign of A is 0, set q0 to 1; otherwise,


set q0 to 0.

Step 2: If the sign of A is 1, add M to A.

Arithmetic 37
 A nonrestoring-division example:

Initially 0 0 0 0 0 1 0 0 0
0 0 0 1 1
S hift 0 0 0 0 1 0 0 0 0 1st
S ubtract 1 1 1 0 1 Cycle
S et q0 1 1 1 1 0 0 0 0 0

S hift 1 1 1 0 0 0 0 0 0
Add 0 0 0 1 1 2nd
Cycle
S et q0 1 1 1 1 1 0 0 0 0

S hift 1 1 1 1 0 0 0 0 0
Add 0 0 0 1 1 3rd
Cycle
S et q0 0 0 0 0 1 0 0 0 1

S hift 0 0 0 1 0 0 0 1 0
S ubtract 1 1 1 0 1 4th
Cycle
S et q0 1 1 1 1 1 0 0 1 0

Quotient

Add 1 1 1 1 1
0 0 0 1 1 Restore
Remainder
0 0 0 1 0

Remainder

Arithmetic 38
FLOATING-POINT NUMBERS AND OPERATIONS

 Until now, the discussions have dealt with signed,


fixed-point numbers and have conveniently
considered them integers, that is, having an implied
binary point at the right end of each number.

 It is also possible to assume that the binary point is


just to the right of the sign bit, thus representing a
fraction.

 In the 2’s-complement system, the signed value F,


represented by the n-bit binary fraction:

B = b0.b-1b-2 … b- (n-1)

is given by:

F(B) = -b0 x 20 + b-1 x 2-1 + b-2 x 2-2 +


… + b- (n-1) x 2-(n-1)

where the range of F is:

-1  F  1 – 2-(n-1)

 Consider the range of values possible representable in


a 32-bit, signed, fixed-point format. Interpreted as
integers, the value range is approximately 0 to 2.15 x
109. If interpreted as fractions, the range is
approximately 4.55 x 10-10 to 1.

Arithmetic 39
 Neither of these ranges is sufficient for scientific
calculations, which might involve parameters like
Avogadro’s number (6.0247 x 1023) or Planck’s
constant (6.6254 x 10-27).

 This means that a computer must be able to represent


numbers and operate on them in such a way that the
position of the binary point is variable and is
automatically adjusted as computation proceeds.

 In such a case, the binary point is said to float, and the


numbers are called floating-point numbers. This
distinguishes them from fixed-point numbers, whose
binary point is always in the same position.

 In the decimal scientific notation, numbers may be


written as 6.0246 x 1023, 6.6254 x 10-27, -1.0341 x 102,
-7.3000 x 10-14, and so on. These numbers are said to
be given five significant digits. The scale factors
(1023, 10-27, and so on) indicate the position of the
decimal point with respect to the significant digits.

 By convention, when the decimal point is placed to


the right of the first (nonzero) significant digit, the
number is said to be normalized.

 A floating-point representation is one in which a


number is represented by its sign, a string of
significant digits, commonly called the mantissa, and
an exponent to an implied base for the scale factor.

Arithmetic 40
 The standard for representing floating-point numbers
in 32 bits has been developed and specified in detail
by the IEEE.

32 bits

S E' M
Sign of
Numbers:
0 signifies +
1 signifies - 8-bit signed exponent 23-bit
in excess-127 mantissa fraction
representation

value represented =  1.M x 2E’-127

The sign of the number is given in the first bit,


followed by a representation for the exponent (to the
base 2) of the scale factor. Instead of the signed
exponent, E, the value actually stored in the exponent
field is an unsigned integer E’ = E + 127. This is
called the excess-127 format. Thus E’ is in the range
0  E’  255.

Arithmetic 41
Examples:

Exponent Exponent Actual


(Binary) (Decimal) Exponent
00000001 1 -126
00000010 2 -125
. . .
. . .
. . .
01111111 127 0
10000000 128 +1
10000001 129 +2
. . .
. . .
. . .
11111101 253 +126
11111110 254 +127

The end values of E’, namely, 0 and 255, are used to


indicate floating-point values of exact 0 and infinity,
respectively. Thus, the range of E’ for normal values
is 0 < E'’< 255. This means that the actual exponent,
E, is in the range –126  E  127.

The excess–x representation for exponents enables


efficient comparison of the relative sizes of two
floating-point numbers.

Arithmetic 42
The last 23 bits represent the mantissa. Since binary
normalization is used, the most significant bit of the
mantissa is always equal to 1. This bit is not
explicitly represented; it is assumed to be to the
immediate left of the binary point. The 23 bits stored
in the M field represent the fractional part of the
mantissa, that is, the bits to the right of the binary
point.

Example:

0 00101000 001010 . . . 0

value represented = 1.001010 ... 0 x 2-87

 The 32-bit standard representation given is called a


single-precision representation because it occupies a
single 32-bit word. The scale factor of this
representation has a range of 2-126 to 2+127, which is
approximately 1038. The 24-bit mantissa provides
approximately the same precision as a 7-digit decimal
value.

Arithmetic 43
 To provide more precision and range for floating-
point numbers, the IEEE standard also specifies a
double-precision format for floating-point number
representation:

64 bits

S E' M
Sign of
Numbers:
0 signifies +
1 signifies - 11-bit signed exponent 52-bit
in excess-1023 mantissa fraction
representation

The 11-bit excess-1023 exponent E’ of the double-


precision format has the range 0 < E’ < 2047 for
normal values. Thus the actual exponent E is in the
range –1022  E  1023, providing scale factors that
range from 2-1022 to 21023 (approximately 10308).

The 53-bit mantissa provides a precision equivalent to


about 16 decimal digits.

Arithmetic 44
 If a number is not normalized, it can always be put in
normalized form by shifting the fraction and adjusting
the exponent.

Example:

excess-127 exponent

0 10001000 0010110 . . .

(There is no implicit 1 to the left of the binary point)

value represented = +0.0010110 ... x 29

Unnormalized Value

0 10000101 0110 . . .

value represented = +1.0110 ... x 26

Normalized Value

Arithmetic 45
 Arithmetic Operations on Floating-Point Numbers

Example of a floating point addition:

2.9400 x 102
+ 4.3100 x 104

Since the exponents of the two numbers differ,


mantissas must be shifted with respect to each other
before they are added or subtracted. Normally, the
number with the smaller exponent is adjusted so that it
will have an exponent equal to the other number.

The result of course, will have an exponent equal to


the higher exponent.

0.0294 x 104
+ 4.3100 x 104

4.3394 x 104

Arithmetic 46
Add/Subtract Rule

1. Choose the number with the smaller exponent


and shift its mantissa right a number of steps
equal to the difference in exponents.

2. Set the exponent of the result equal to the larger


exponent.

3. Perform addition/subtraction on the mantissas


and determine the sign of the result.

4. Normalize the resulting value if necessary.

 Although the mantissas of the initial operands and


final results are limited to 24 bits, including the
implicit leading 1, it is important to retain extra bits,
often called guard bits, during the intermediate steps.
This yields maximum accuracy in the results.

Removing guard bits in generating the final results is


called truncation. There are three ways to truncate:

1. Chopping. The guard bits are simply removed


and no changes are made in the retained bits.

Arithmetic 47
Example: 6-bit to 3-bit truncation

All fractions in the range 0.b-1b-2b-3000 and


0.b-1b-2b-3111 are truncated to 0.b-1b-2b-3.

2. Von Neumann Rounding. If the bits to be


removed are all 0s, they are simply dropped, with
no changes in the retained bits. However, if any
of the bits to be removed are 1, the least
significant bit of the retained bits is set to 1.

Example: 6-bit to 3-bit truncation

All 6-bit fractions with b-4b-5b-6 not equal to


000 are truncated as 0.b-1b-21.

3. Rounding. A 1 is added to the LSB position of


the bits retained if there is a 1 in the MSB
position of the bits to be removed.

Example:

0.b-1b-2b-31… is rounded to 0.b-1b-2b-3+


0.001

0.b-1b-2b-30… is rounded to 0.b-1b-2b-3

Arithmetic 48
Implementation of Floating-Point Operations

E A' EB'
MA MB

8-bit
S ubtractor S WAP
M of number
with smaller E'

M of number
sign with larger E'
n = | EA' - EB' | S HIFTER

n bits
SB to right
SA

Add /
Subtract
Combinational Add / Subtract Mantissa
CONTROL
Adder/S ubtractor
Network
Sign

Magnitude M

E A' EB'
Leading Zeros
Detector

MPX
X
Normalize
E' and
Round

8-bit
S ubtractor

E' - X

32-bit
R : SR E R' MR Result
R=A+B

Arithmetic 49
Let the signs, exponents, and mantissas of operands A
and B be represented by SA, EA, MA and SB, EB, MB,
respectively.

The first step is to compare exponents to determine


how far to shift the mantissa of the number with the
smaller exponent. This shift-count value n is
determined by the 8-bit subtractor circuit in the upper
left corner of the figure.

The magnitude of the difference EA – EB, which is n, is


sent to the SHIFTER unit. The sign of the difference
resulting from the exponent comparison determines
which mantissa is to be shifted. Therefore the sign is
sent to the SWAP network in the upper right corner of
the figure. If the sign is 0, the EA  EB, the mantissa
MB is sent to the SHIFTER since the exponent of B is
smaller. Otherwise (EA < EB), the mantissa MA is sent
to the SHIFTER.

Step 2 is performed by the two-way multiplexer,


MPX, in the bottom left corner of the figure. The
exponent of the result, E, is tentatively determined as
EA if EA  EB, or EB if EA < EB. This is determined by
the sign of the difference resulting from the exponent
comparison operation in step 1.

Arithmetic 50
Step 3 involves the major component, the mantissa
adder-subtractor in the middle of the figure. The
CONTROL logic determines whether the mantissas
are to be added or subtracted. This is decided by the
signs of the operands (SA and SB) and the operation
(add or subtract) that is to be performed on the
operands. The CONTROL logic also determines the
sign of the result, SR.

Example:

If A is negative (SA = 1), B is positive (SB = 0),


and the operation is A – B, then the mantissas are
subtracted (the CONTROL logic will output a
SUB operation to the adder-subtractor) and the
sign of the result is negative (SR = 1).

On the other hand, if A and B are both positive,


and the operation is A – B, then the mantissas are
subtracted. The sign of the result, SR, now
depends on the mantissa subtraction operation.

Step 4 of the Add/Subtract rule consists of


normalizing the result of step 3, mantissa M. The
number of leading zeros in M determines the number
of bit shifts, X, to be applied to M. The normalized
value to be truncated is truncated to generate the 24-
bit mantissa, MR, of the result. The value X is also
subtracted from the tentative result exponent E’ to
generate the true exponent, E’R.

Arithmetic 51
 Multiplication and division are somewhat easier than
addition and subtraction, in that no alignment of
mantissas is needed.

Multiply Rule

1. Add the exponents and subtract 127.

2. Multiply the mantissas and determine the sign of


the result.

3. Normalize the resulting value if necessary.

Divide Rule

1. Subtract the exponents and add 127.

2. Divide the mantissas and determine the sign of


the result.

3. Normalize the resulting value if necessary.

The addition or subtraction of 127 in the Multiply and


Divide rules results from using the excess-127
notation for exponents.

Arithmetic 52

You might also like