A common analogy for a pipeline is the assembly line used in manufacturing. The end goal is to increase productivity - the number of instructions executed per second or the number of cars built per day - by dividing a complex operation into pieces that can be performed in parallel. Separate ``workers'' implement successive steps along the assembly line, and when an item finishes one step it is passed down the line to next step.
Pipelines are used in two major areas in computer design: instruction processing and arithmetic operations. The following requirements must be satisfied in a pipelined system:
.
One possible
breakdown of this function into stages is as follows [33]:
swap the operands. Find the difference in
exponents
.
to the right by
bits.
and
. The exponent of the
sum is
.
A very important requirement for overlapping operations this way is that there be no resource conflicts, i.e. the operands must be independent. For example, suppose a program contains the two instructions
R2 = R0 + R1 R4 = R3 + R1
Note that both instructions identify R1 as one of their
inputs. There is a potential conflict in stages two and three of
the adder pipeline because stage two might need to shift the
mantissa of R1 at the same time stage three needs to add the
mantissa of R0 to the mantissa of R1. The solution
is to make
copies of the operands, and pass the copies through the pipeline.
Thus the CPU gives the adder copies of R0 and R1 when
it starts
the pipeline for the first pair, and the second stage gets these
copies from the first stage along with a value of
.
There is
another potential problem illustrated by this example. Suppose
the second instruction is changed so one of its operands is R2,
so the pair of instructions is:
R2 = R0 + R1 R4 = R2 + R1
Now the second instruction depends on the result of the first instruction. The CPU cannot send the second pair of operands to the pipelined adder until the result of the first addition exits the last stage of the pipeline. Interactions such as these lead to periods when pipeline stages are empty. These empty time slots are often called bubbles.
Figure 5
shows a type of
diagram, known as a Gantt chart, that
is commonly used to
illustrate the operation of a pipeline. The horizontal axis
represents time. There is one row for each stage of the pipeline.
A line segment during cycle
means a stage is active in that
cycle; a blank means the stage is inactive. The figure
illustrates the pipelined floating point adder of the previous
example in the case when two successive instructions can be
overlapped and in the case where the second instruction must wait
for the first to complete. In the case of overlapped
instructions, note that in cycle
the first stage is busy with
the second instruction while the second stage is busy with the
first instruction. In each successive cycle the two instructions
are passed down the ``assembly line'' to the next stage. In the
first case, the second sum is done after 6 cycles, but in the
second case it is not finished until after the 10th cycle. The
``bubble'' in the pipeline is the 4-cycle dead period in each stage
caused by delaying the second instruction.
It should be apparent
that in the general case a pipeline of depth
can process
items in
steps when there are no bubbles. Without a
pipeline,
each application of the basic function would require
cycles,
and they would have to be executed sequentially, for a total time
of
cycles. The speedup obtained by a full pipeline is thus

When
we can safely ignore the
in the denominator, so the asymptotic
speedup, observed for large
, is a factor of
.
For example,
suppose we want to add 1000 pairs of numbers, e.g. when adding
two 1000-element vectors. If it takes 5 cycles for each addition,
a machine without a pipelined adder would require 5000 cycles.
With our 5-stage pipelined adder, the last sum will appear after
cycles, so the pipeline is
times faster.
Providing a steady
stream of independent operands that will keep a pipeline full is
the distinguishing feature of a vector processor, which can
initiate such a series of operations with a single instruction.
There are many possible sources of bubbles in pipelines. Dependencies between instructions are the main cause. For arithmetic pipelines, data dependencies arise when pairs of operations share inputs and outputs. Consider the following two add instructions:
z1 = x1 + y1 ... z2 = x2 + y2
The dependence
illustrated previously is x2 = z1, e.g. both operands are the
register R2. Other dependencies are z1 = z2 (both
instructions write to the same register) and
x1 = z2 (the output from the
second instruction may overwrite the register before the first
has had a chance to read the old value; an unlikely occurrence in
a vector machine, but a situation that must be taken into
account). Note that the instructions do not have to be
consecutive, i.e. there could be intervening instructions. A
compiler that checks for dependencies and possibly reorders
instructions has to ``look ahead'' in the code by an amount equal
to the depth of the pipeline used in the first instruction.
Instruction pipelines, used to speed up the fetch-decode-execute cycle, are also susceptible to bubbles. The most common case here is caused by branch or loop instructions, which are control dependencies. If the pipeline is ``looking ahead'' and fetching instructions it thinks the machine will want to execute, but in fact the machine branches to another location, a bubble is introduced while the fetch stage goes to get the instructions at the new location.
Pipelines have been widely used in high performance machines for many years. The CDC 6600 is a classic example of a complex instruction pipeline, using a circuit known as a ``scoreboard'' to detect data dependencies between instructions and instruction buffers to implement pipelining in the fetch-decode logic. The Cray-1 was a very successful early supercomputer in which every data processing unit was pipelined. Until the late 1980s pipelining was one of the attributes that separated ``mainframes'' and supercomputers from microprocessors, but with the advance of VLSI technology microprocessors now have room on chip for complex control circuitry. Most now have pipelined data processing units and complex instruction scheduling logic that rivals that of the CDC 6600, an early pipelined processor known for its innovative instruction processing.