A common analogy for a pipeline is the assembly line used in manufacturing. The end goal is to increase productivity - the number of instructions executed per second or the number of cars built per day - by dividing a complex operation into pieces that can be performed in parallel. Separate ``workers'' implement successive steps along the assembly line, and when an item finishes one step it is passed down the line to next step.
Pipelines are used in two major areas in computer design: instruction processing and arithmetic operations. The following requirements must be satisfied in a pipelined system:
A very important requirement for overlapping operations this way is that there be no resource conflicts, i.e. the operands must be independent. For example, suppose a program contains the two instructions
R2 = R0 + R1 R4 = R3 + R1
Note that both instructions identify
R1 as one of their
inputs. There is a potential conflict in stages two and three of
the adder pipeline because stage two might need to shift the
R1 at the same time stage three needs to add the
R0 to the mantissa of
R1. The solution
is to make
copies of the operands, and pass the copies through the pipeline.
Thus the CPU gives the adder copies of
the pipeline for the first pair, and the second stage gets these
copies from the first stage along with a value of .
another potential problem illustrated by this example. Suppose
the second instruction is changed so one of its operands is
so the pair of instructions is:
R2 = R0 + R1 R4 = R2 + R1
Now the second instruction depends on the result of the first instruction. The CPU cannot send the second pair of operands to the pipelined adder until the result of the first addition exits the last stage of the pipeline. Interactions such as these lead to periods when pipeline stages are empty. These empty time slots are often called bubbles.
Figure 5 shows a type of diagram, known as a Gantt chart, that is commonly used to illustrate the operation of a pipeline. The horizontal axis represents time. There is one row for each stage of the pipeline. A line segment during cycle means a stage is active in that cycle; a blank means the stage is inactive. The figure illustrates the pipelined floating point adder of the previous example in the case when two successive instructions can be overlapped and in the case where the second instruction must wait for the first to complete. In the case of overlapped instructions, note that in cycle the first stage is busy with the second instruction while the second stage is busy with the first instruction. In each successive cycle the two instructions are passed down the ``assembly line'' to the next stage. In the first case, the second sum is done after 6 cycles, but in the second case it is not finished until after the 10th cycle. The ``bubble'' in the pipeline is the 4-cycle dead period in each stage caused by delaying the second instruction.
It should be apparent that in the general case a pipeline of depth can process items in steps when there are no bubbles. Without a pipeline, each application of the basic function would require cycles, and they would have to be executed sequentially, for a total time of cycles. The speedup obtained by a full pipeline is thus
When we can safely ignore the in the denominator, so the asymptotic speedup, observed for large , is a factor of . For example, suppose we want to add 1000 pairs of numbers, e.g. when adding two 1000-element vectors. If it takes 5 cycles for each addition, a machine without a pipelined adder would require 5000 cycles. With our 5-stage pipelined adder, the last sum will appear after cycles, so the pipeline is times faster. Providing a steady stream of independent operands that will keep a pipeline full is the distinguishing feature of a vector processor, which can initiate such a series of operations with a single instruction.
There are many possible sources of bubbles in pipelines. Dependencies between instructions are the main cause. For arithmetic pipelines, data dependencies arise when pairs of operations share inputs and outputs. Consider the following two add instructions:
z1 = x1 + y1 ... z2 = x2 + y2
illustrated previously is
x2 = z1, e.g. both operands are the
R2. Other dependencies are
z1 = z2 (both
instructions write to the same register) and
x1 = z2 (the output from the
second instruction may overwrite the register before the first
has had a chance to read the old value; an unlikely occurrence in
a vector machine, but a situation that must be taken into
account). Note that the instructions do not have to be
consecutive, i.e. there could be intervening instructions. A
compiler that checks for dependencies and possibly reorders
instructions has to ``look ahead'' in the code by an amount equal
to the depth of the pipeline used in the first instruction.
Instruction pipelines, used to speed up the fetch-decode-execute cycle, are also susceptible to bubbles. The most common case here is caused by branch or loop instructions, which are control dependencies. If the pipeline is ``looking ahead'' and fetching instructions it thinks the machine will want to execute, but in fact the machine branches to another location, a bubble is introduced while the fetch stage goes to get the instructions at the new location.
Pipelines have been widely used in high performance machines for many years. The CDC 6600 is a classic example of a complex instruction pipeline, using a circuit known as a ``scoreboard'' to detect data dependencies between instructions and instruction buffers to implement pipelining in the fetch-decode logic. The Cray-1 was a very successful early supercomputer in which every data processing unit was pipelined. Until the late 1980s pipelining was one of the attributes that separated ``mainframes'' and supercomputers from microprocessors, but with the advance of VLSI technology microprocessors now have room on chip for complex control circuitry. Most now have pipelined data processing units and complex instruction scheduling logic that rivals that of the CDC 6600, an early pipelined processor known for its innovative instruction processing.