So far the discussion of high performance computing has concentrated on increasing the amount of processing power in a system, either through parallelism, which seeks to increase the number of instructions that can be executed in a time period, or through pipelining, which improves the instruction throughput. Another, equally important, aspect of high performance computing is the organization of the memory system. No matter how fast one makes the processing unit, if the memory cannot keep up and provide instructions and data at a sufficient rate there will be no improvement in performance. The main problem that needs to be overcome in matching memory response to processor speed is the memory cycle time, defined in section 2.2 to be the time between two successive memory operations. Processor cycle times are typically much shorter than memory cycle times. When a processor initiates a memory transfer at time , the memory will be ``busy'' until , where is the memory cycle time. During this period no other device - I/O controller, other processors, or even the processor that makes the request - can use the memory since it will be busy responding to the request.
Solutions to the memory access problem have led to a dichotomy in parallel systems. In one type of system, known as a shared memory system, there is one large virtual memory, and all processors have equal access to data and instructions in this memory. The other type of system is a distributed memory, in which each processor has a local memory that is not accessible from any other processor.
difference between shared or distributed memory is a difference
in the structure of virtual memory, i.e. the memory as seen from
the perspective of a processor. Physically, almost every memory
system is partitioned into separate components that can be
accessed independently. What distinguishes a shared memory from a
distributed memory is how the memory subsystem interprets an
address generated by a processor. As an example, suppose a
processor executes the instruction
load R0,i, which means
R0 with the contents of memory location
Mem[i]). The question is, what does
i mean? In a shared memory
i is a global address, and
Mem[i] to one processor is the
same memory cell as
Mem[i] to another processor. If both
processors execute this instruction at the same time they will
both load the same information into their
R0 registers. In a
distributed memory system,
i is a local address. If two
processors both execute
load R0,i they may end up with different
values in their
R0 registers since
Mem[i] designates two
different memory cells, one in the local memory of each
The distinction between shared memory and distributed memory is an important one for programmers because it determines how different parts of a parallel program will communicate. In a shared memory system it is only necessary to build a data structure in memory and pass references to the data structure to parallel subroutines. For example, a matrix multiplication routine that breaks matrices into quadrants only needs to pass the indices of each quadrant to the parallel subroutines. A distributed memory machine on the other hand must create copies of shared data in each local memory. These copies are created by sending a message containing the data to another processor. In the matrix multiplication example, the controlling process would have to send messages to three other processors. Each message would contain the submatrices required to compute one quadrant of the result. A drawback to this memory organization is that these messages might have to be quite large; in this example, half of each input matrix needs to be sent to each parallel subroutine.
In this section we will explore the range of techniques used to connect processors to memories in high performance computers and how these techniques affect programmers. The first section is on interleaved memory, a method long used in vector processors to provide successive vector elements at a rate that matches the cycle time in the pipelined data processing units. The next two sections deal with shared memory and distributed memory organizations for parallel systems.