The use of vector instructions pays off in two different ways. First, the machine has to fetch and decode far fewer instructions, so the control unit overhead is greatly reduced and the memory bandwidth necessary to perform this sequence of operations is reduced a corresponding amount. The second payoff, equally important, is that the instruction provides the processor with a regular source of data. When the vector instruction is initiated, the machine knows it will have to fetch n pairs of operands which are arranged in a regular pattern in memory. Thus the processor can tell the memory system to start sending those pairs. With an interleaved memory, the pairs will arrive at a rate of one per cycle, at which point they can be routed directly to a pipelined data unit for processing. Without an interleaved memory or some other way of providing operands at a high rate the advantages of processing an entire vector with a single instruction would be greatly reduced.
A key division of vector processors arises from the way the instructions access their operands. In the memory to memory organization the operands are fetched from memory and routed directly to the functional unit. Results are streamed back out to memory as the operation proceeds. In the register to register organization operands are first loaded into a set of vector registers, each of which can hold a segment of a register, for example 64 elements. The vector operation then proceeds by fetching the operands from the vector registers and returning the results to a vector register.