next up previous

3.5.1 Vector Processors     continued...

In the register to register machines the vectors have a relatively short length, 64 in the case of the Cray family, but the startup time is far less than on the memory to memory machines. Thus these machines are much more efficient for operations involving short vectors, but for long vector operations the vector registers must loaded with each segment before the operation can continue. Register to register machines now dominate the vector computer market, with a number of offerings from Cray Research Inc., including the Y-MP and the C-90. The approach is also the basis for machines from Fujitsu, Hitachi and NEC. Clock cycles on modern vector processors range from 2.5ns (NEC SX-3) to 4.2ns (Cray C90), and single processor performance on LINPACK benchmarks is in the range of 1000 to 2000 MFLOPS (1 to 2 GFLOPS).

The basic processor architecture of the Cray supercomputers has changed little since the Cray-1 was introduced in 1976 [28]. There are 8 vector registers, named V0 through V7, which each hold 64 64-bit words. There are also 8 scalar registers, which hold single 64-bit words, and 8 address registers (for pointers) that have 20-bit words. Instead of a cache, these machines have a set of backup registers for the scalar and address registers; transfer to and from the backup registers is done under program control, rather than by lower level hardware using dynamic memory referencing patterns.

The original Cray-1 had 12 pipelined data processing units; newer Cray systems have 14. There are separate pipelines for addition, multiplication, computing reciprocals (to divide X by Y, a Cray computes ), and logical operations. The cycle time of the data processing pipelines is carefully matched to the memory cycle times. The memory system delivers one value per clock cycle through the use of 4-way interleaved memory.