next up previous

3.5.6 SIMD Machines     continued...

The MasPar MP-1 was introduced a few years after the CM-1. It also has a very narrow datapath, but it processes data 4 bits at a time instead of 1 bit at a time. Each processor can have up to 64KB of local memory. One of the interesting aspects of the MP-1 is that there are two separate communication systems, and programmers can alternate between them to choose the best performance for different parts of their algorithms. One interconnection network is known as the X-net (Figure 20). It connects each processor to its 8 nearest neighbors in a 2D mesh with wraparound connections. The other connection is a global router, which provides point-to-point communication between any two PEs. The router is implemented by a 3-stage switching network, where each stage in a 1024-processor machine contains a crossbar; together the three stages comprise a crossbar. The processors are controlled by a proprietary RISC processor known as the array control unit, or ACU. The ACU has its own local memory and is used for scalar operations, while the processor array is intended for vector and array operations. An MP-1 can be configured as an square mesh or a rectangular mesh. The smallest configuration has 1,024 processors and the largest has 16,384 processors in a grid.

The newest machines from Thinking Machines and MasPar are the CM-5 and MP-2, respectively. The CM-5 is described in more detail in [15]. The MP-2 has a wider internal data path than the MP-1 --- 32 bits vs. 4 bits --- but is otherwise very similar to the MP-1 in that it uses both the X-net and global router to connect PEs in a 2D mesh. The largest MP-2, which has 16,384 () processors, has a theoretical peak performance of 550 MFLOPS and reaches 473 MFLOPS on the LINPACK benchmark for parallel machines.