Another important interaction between user programs and computer architecture is in the representation of numbers. This interaction does not affect performance as much as it does portability. Users must be extremely careful when moving programs and/or data files from one system to another because numbers and other data are not always represented the same way. Recently programming languages have begun to allow users to have more control over how numbers are represented and to write code that does not depend so heavily on data representations that it fails when executed on the ``wrong'' system.
The binary number system is the starting point for representing information. All items in a computer's memory - numbers, characters, instructions, etc. - are represented by strings of 1's and 0's. These two values designate one of two possible states for the underlying physical memory. It does not matter to us which state corresponds to 1 and which corresponds to 0, or even what medium is used. In an electronic memory, 1 could stand for a positively charged region of semiconductor and 0 for a neutral region, or on a device that can be magnetized a 1 would represent a portion of the surface that has a flux in one direction, while a 0 would indicate a flux in the opposite direction. It is only important that the mapping from the set {1,0} to the two states be consistent and that the states can be detected and modified at will.
Systems usually deal with fixed-length strings of binary digits. The smallest unit of memory is a single bit, which holds a single binary digit. The next largest unit is a byte, now universally recognized to be eight bits (early systems used anywhere from six to eight bits per byte). A word is 32 bits long in most workstations and personal computers, and 64 bits in supercomputers. A double word is twice as long as a single word, and operations that use double words are said to be double precision operations.
Storing a positive integer in a system is trivial: simply write the integer in binary and use the resulting string as the pattern to store in memory. Since numbers are usually stored one per word, the number is padded with leading 0's first. For example, the number 52 is represented in a 16-bit word by the pattern 0000000000110100.
The meaning of an
-bit string
when it is interpreted as a binary number is defined by the
formula,
,
i.e. bit number i has weight:

Compiler writers and
assembly language programmers often take advantage of the binary
number system when implementing arithmetic operations. For
example, if the pattern of bits is ``shifted left'' by one, the
corresponding number is multiplied by two. A left shift is
performed by moving every bit left and inserting 0's on the right
side. In an 8-bit system, for example, the pattern 00000110
represents the number 6; if this pattern is shifted left, the
resulting pattern is 00001100, which is the representation of the
number 12. In general, shifting left by
bits is equivalent to
multiplying by
.
Shifts such as these can be done in one machine
cycle, so they are much faster than multiplication instructions,
which usually takes several cycles. Other ``tricks'' are using a
right shift to implement integer division by a power of 2, in
which the result is an integer and the remainder is
ignored (e.g. 15
4 = 3) and taking the modulus
or remainder with respect to a power of
2 (see problem 8).
A fundamental relationship about
binary patterns is that there are 2
distinct
-digit
strings. For
example, for
there are
= 256 different
strings of 1's and 0's. From
this relationship it is easy to see that the largest integer that
can be stored in an
-bit word is
:
the
patterns are used to
represent the
integers in the interval
.
An overflow occurs when
a system generates a value greater than the largest integer. For
example, in a 32-bit system, the largest positive integer is
.
= 4,294,976,295. If a program tries to add
3,000,000,000 and 2,000,000,000 it will
cause an overflow. Right away we can see one source of problems
that can arise when moving a program from one system to another:
if the word size is smaller on the new system a program that runs
successfully on the original system may crash with an overflow
error on the new system.
There are two different techniques for representing negative values. One method is to divide the word into two fields, i.e. represent two different types of information within the word. We can use one field to represent the sign of the number, and the other field to represent the value of the number. Since a number can be just positive or negative, we need only one bit for the sign field. Typically the leftmost bit represents the sign, with the convention that a 1 means the number is negative and a 0 means it is positive. This type of representation is known as a sign-magnitude representation, after the names of the two fields. For example, in a 16-bit sign-magnitude system, the pattern 1000000011111111 represents the number and the pattern 0000000000000101 represents +5.
The other technique for representing both positive
and negative integers is known as two's complement. It has two
compelling advantages over the sign-magnitude representation, and
is now universally used for integers, but as we will see below
sign-magnitude is still used to represent real numbers. The two's
complement method is based on the fact that binary arithmetic in
fixed-length words is actually arithmetic over a finite cyclic
group. If we ignore overflows for a moment, observe what happens
when we add 1 to the largest possible number in an
-bit
system (this number is represented by a string of
1's):

The result is a
pattern with a leading 1 and
0's. In an
-bit
system only the low order
bits of
each result are saved, so this sum is functionally
equivalent to 0. Operations that lead to sums with very large
values ``wrap around'' to 0, i.e. the system is a finite cyclic
group. Operations in this group are defined by arithmetic
modulo 2
.
For our purposes, what is
interesting about this type of
arithmetic is that
, which is represented by a 1 followed
by
0's,
is equivalent to 0, which means
for all
between 0 and
. A
simple ``trick'' that has its roots in this fact can be applied to
the bit pattern of a number in order to calculate its additive
inverse: if we invert every bit (turn a 1 into a 0 and vice
versa) in the representation of a number
and then add 1,
we come
up with the representation of
. For example, the representation
of 5 in an 8-bit system is 00000101. Inverting every bit and
adding 1 to the result gives the pattern 11111011. This is also
the representation of 251, but in arithmetic modulo 2
we have so
this pattern is a perfectly acceptable representation of
(see
problem 7).
In practice we divide all
-bit patterns
into two groups. Patterns that begin with 0 represent the
positive integers
and patterns
beginning with 1 represent the
negative integers
. To
determine which integer is represented by
a pattern that begins with a 1, compute its complement (invert
every bit and add 1). For example, in an 8-bit two's complement
system the pattern 11100001 represents , since the complement is
.
Note that the leading bit determines the sign, just as in a
sign-magnitude system, but one cannot simply look at the
remaining bits to ascertain the magnitude of the number. In a
sign-magnitude system, the same pattern represents
.
The first step in defining a representation for real numbers is to realize that binary notation can be extended to cover negative powers of two, e.g. the string ``110.101'' is interpreted as

Thus a straightforward method for representing real numbers would be to specify some location within a word as the ``binary point'' and give bits to the left of this location weights that are positive powers of two and bits to the right weights that are negative powers of two. For example, in a 16-bit word, we can dedicate the rightmost 5 bits for the fraction part and the leftmost 11 bits for the whole part. In this system, the representation of 6.625 is 0000000011010100 (note there are leading 0's to pad the whole part and trailing 0's to pad the fraction part). This representation, where there is an implied binary point at a fixed location within the word, is known as a fixed point representation.
There is an obvious tradeoff between range and
precision in fixed point representations.
bits for
the fraction
part means there will be
numbers in the system between any two
successive integers. With 5 bit fractions there are 32 numbers in
the system between any two integers; e.g. the numbers between 5
and 6 are 5
(5.03125), 5
(5.03125), etc.
To allow more
precision, i.e. smaller divisions between successive numbers, we
need more bits in the fraction part. The number of bits in the
whole part determines the magnitude of the largest positive
number we can represent, just as it does for integers. With 11
digits in the whole part, as in the example above, the largest
number we can represent in 16 bits is
. Moving one bit from the
whole part to the fraction part in order to increase precision
cuts the range in half, and the largest number is now
.
To allow
for a larger range without sacrificing precision, computer
systems use a technique known as floating point. This
representation is based on the familiar ``scientific notation'' for
expressing both very large and very small numbers in a concise
format as the product of a small real number and a power of 10,
e.g.
. This notation
has three components: a base (10 in this
example); an exponent (in this case 23); and a mantissa (6.022).
In computer systems, the base is either 2 or 16. Since it never
changes for any given computer system it does not have to be part
of the representation, and we need only two fields to specify a
value, one for the mantissa and one for the exponent.
As an example of how a number is represented in floating point, consider again the number 6.625. In binary, it is

If a 16-bit system has a 10-bit mantissa and 6-bit exponent, the number would be represented by the string 1101010000 000010. The mantissa is stored in the first ten bits (padded on the right with trailing 0's), and the exponent is stored in the last six bits.
As the
above example illustrates, computers transform the numbers so the
mantissa is a manageable number. Just as
is
preferred to
or
in
scientific notation, in binary the mantissa should be between
and
. When the
mantissa is in this range it is
said to be normalized. The definition of the normal form varies
from system to system, e.g. in some systems a normalized mantissa
is between
and
.
Since we need to represent
both positive and negative real numbers, the complete
representation for a real number in a floating point format has
three fields: a one-bit sign, a fixed number of bits for the
mantissa, and the remainder of the bits for the exponent. Note
that the exponent is an integer, and that this integer can be
either positive or negative, e.g. we will want to represent very
small numbers such as
. Any method such
as two's complement that
can represent both positive and negative integers can be used
within the exponent field. The sign bit at the front of the
number determines the sign of the entire number, which is
independent of the sign of the exponent, e.g. it indicates
whether the number is
or
.
In the past every computer
manufacturer used their own floating point representation, which
made it a nightmare to move programs and datasets from one system
to another. A recent IEEE standard is now being widely adopted
and will add stability to this area of computer architecture. For
32-bit systems, the standard calls for a 1-bit sign, 8-bit
exponent, and 23-bit mantissa. The largest number that can be
represented is
, and the
smallest positive number (closest to
0.0) is
. Details of the
standard are presented in an appendix to
this chapter.
Figure 2
illustrates the numbers that
can be stored in a typical computer system with a floating point
representation. The figure shows three disjoint regions: positive
numbers
, 0.0, and negative
numbers
.
is the largest
number that
can be stored in the system; in the IEEE standard representation
.
is the smallest positive
number, which is
in the IEEE
standard.
Programmers need to be aware of several important
attributes of the floating point representation that are
illustrated by this figure. The first is the magnitude of the
range between
and
. There are about
integers in this range.
However there are only
different 32-bit
patterns. What this
means is there are numbers in the range that do not have
representations. Whenever a calculation results in one of these
numbers, a round-off error will occur when the system
approximates the result by the nearest (we hope) representable
number. The arithmetic circuitry will produce a binary pattern
that is close to the desired result, but not an exact
representation. An interesting illustration of just how common
these round-off errors are is the fact that 1 does not have a
finite representation in binary, but is instead the infinitely
repeating pattern
.
The next important point is that there is a
gap between
, the smallest positive number, and 0.0.
A round-off
error in a calculation that should produce a small non-zero value
but instead results in 0.0 is called an underflow. One of the
strengths of the IEEE standard is that it allows a special
denormalized form for very small numbers in order to stave off
underflows as long as possible. This is why the exponent in the
largest and smallest positive numbers are not symmetrical.
Without denormalized numbers, the smallest positive number in the
IEEE standard would be around
.
Finally, and perhaps most
important, is the fact that the numbers that can be represented
are not distributed evenly throughout the range. Representable
numbers are very dense close to 0.0, but then grow steadily
further apart as they increase in magnitude. The dark regions in
Figure 2
correspond to parts of the number line where
representable numbers are packed close together. It is easy to
see why the distribution is not even by asking what two numbers
are represented by two successive values of the mantissa for any
given exponent. To make the calculations easier, suppose we have
a 16-bit system with a 7-bit mantissa and 8-bit exponent. No
matter what the exponent is, the distance between any two
successive values of the mantissa, e.g. between
and
, will be
.
For numbers closest to 0.0, the exponent will be a negative
number, e.g.
, and the distance between two successive floating
point numbers will be
. At the other
end of the scale, when
exponents are large, the distance between two numbers will be
approximately
, namely
.