The history of shared memory multiprocessors goes back to the early 1970s, to two influential research projects at Carnegie- Mellon University. The first machine, named c.mmp (from the PMS notation for ``computer with multiple mini-processors''), was organized around a crossbar switch that connected 16 PDP-11 processors to 16 memory banks. The second, cm*, also used PDP-11 processors, but connected them via the tree-shaped network shown in Figure 17 on page 63. The basic building block for this system was a processor cluster, which consisted of four processors, each with their own local memories. The global memory space was evenly partitioned among the memories in the system. When a processor generated a request for address i, its bus logic would check to see if i was in the range of addresses in that machine's local memory. If it wasn't, the request was transferred to a cluster controller, which would see if i belonged to any other memory within that cluster. If not, the request would be routed up the tree to another level of cluster controllers. In all, 50 processors were connected by three levels of buses.
cm* was an early example of a non-uniform memory access (NUMA) architecture. Depending on whether an item was in a processor's local memory, within the same cluster, or in another cluster, the time to fetch an item was 3s, 9s, or 27s, respectively. As a reference point, a PDP-11 of this era, without the cluster interconnection logic, could fetch an item from main memory in about 2s.
Figure 17: Mesh cm*.