An early and very influential distributed memory parallel processor was the ``Cosmic Cube'', a research project carried out by members of the Physics and Computer Science departments at Caltech . This was one of the first systems to treat the interconnection network as a medium for exchanging messages, as opposed to an extended bus that simply fetched single words.
Each node in the Cosmic Cube was a single-board computer with an Intel 8086 processor chip, 8087 floating point coprocessor, and 128KB memory. 64 boards were interconnected as a 6-dimensional hypercube. Communication over the interconnection network was fairly slow, at 2Mbps per link, and used a store and forward protocol. Intel's commercial version of the Cosmic Cube was the iPSC-1, which used 80286 processor chips, 512KB memory per node, and 10Mbps communication chips, and came in configurations from 16 to 128 processors (from 4D up to 7D hypercubes). Other commercial hypercubes of this era included the NCUBE-1 and FPS T-series.
The iPSC-2 was also a hypercube-based machine, but it incorporated ``worm-hole routing'' in place of the store and forward packet switching used in earlier systems. A worm-hole router uses a form of circuit switching to establish a communication path between two processors according to fixed rules. For example, in the two dimensional mesh the rule might be to use the vertical links first until the row of processors containing the destination processor is reached and to then use the horizontal links until the connection is made. Efficiency is improved because the technique removes the requirement that each processor along a route makes a decision about the direction of the next step of the communication. This in effect reduces the dependence of the diameter of the array on the number of steps required to transmit a data item from one end of the system to the other. What one gains in efficiency one loses in flexibility because worm-hole routing eliminates the opportunity to use alternate paths that might be provided by the network. For example, congestion on a single link may be unavoidable even though alternate paths are available to ease the congestion.
Following the iPSC/2 (and the iPSC/860, which was similar but used i860 RISC processors instead of 80386 processors at each node), Intel built a research machine known as the Touchstone Delta. A commercial system based on the Delta is the Paragon XP/S. The interconnection network is a 2D mesh instead of a hypercube, and uses specially designed message routing chips to improve communication bandwidth. Each node in the Paragon has two i860 processors, one for computation and the other for message handling. This second processor deals with incoming messages and other overhead so the main processor does not have to be interrupted to handle message traffic.
An interesting machine that is a hybrid with attributes of both SIMD and distributed memory MIMD machines is the CM-5 from Thinking Machines. The basic machine consists of a tree of processing nodes, where each node has a SPARC microprocessor, optional vector processors, and up to 32MB of local memory. The interconnection network is based on the idea of a ``fat tree,'' a tree that has wider communication channels near the root in order to handle the higher volume of traffic expected to flow in that region of the network Figure 19 on page 67. Each communication link in the CM-5 has a bandwidth of 20 Mbps. There are two upward links from each leaf node. The links are attached to different switches, both for higher bandwidth and to provide alternative routes to avoid congestion in the network. First level interior switches have two upward links, but higher level switches have four upward links to implement the fat tree idea of higher bandwidth closer to the root.
The CM-5 has a control network consisting of a set of control processors interconnected with their another tree-shaped network. The control processors and their tree are a completely separate subsystem. Control processors are also SPARC microprocessors, but since they do little if any data processing they do not have as much memory or any vector coprocessors. It is the control network that allows the system to operate as an SIMD or SPMD machine by synchronizing sets of data processors when they are all working on the same program.