
- Job level
parallelism, the highest level of parallelism, is more of
interest to administrators than individual users. What is most
important from this point of view is that a lab or computer
center execute as many jobs as possible in any given time period.
This can be accomplished by purchasing more computer systems so
more jobs are running at any one time, even though any one user's
job will not run faster. Once again we see a distinction between
throughput (number of jobs per day) and latency (the time to
execute a program).
- Program level parallelism occurs when a
single program is broken down into constituent parts. For
example, the matrix product can be computed by breaking C into
quadrants and having four processors compute each quadrant from
the corresponding sections of A and B (also refer to the Chapter
on Numerical Algebra
).
The entire product will be computed roughly four times faster
since each processor can work independently of the others.
- Instruction level parallelism is mostly
invisible to users, i.e. it is below
the level of the architecture and in the domain of
computer organization. Pipelines, introduced briefly in
Section 3.6 and discussed in more detail below,
are the most common way
of implementing this type of parallelism.
- Arithmetic and bit level parallelism
is the lowest level and is mainly of concern to
designers of arithmetic-logic units inside the CPU. For example,
a 64-bit sum can be computed by adding all 64 bits at once (the
carry into the most significant bits can be predicted and
computed almost as fast as the sum of any two bits), or for some
reason the architect may decide to break the operation into 4-bit
pieces and compute the entire sum in 16 cycles.

