next up previous

3.3 Memory Organizations     continued...

The difference between shared or distributed memory is a difference in the structure of virtual memory, i.e. the memory as seen from the perspective of a processor. Physically, almost every memory system is partitioned into separate components that can be accessed independently. What distinguishes a shared memory from a distributed memory is how the memory subsystem interprets an address generated by a processor. As an example, suppose a processor executes the instruction load R0,i, which means ``load register R0 with the contents of memory location i'' (denoted Mem[i]). The question is, what does i mean? In a shared memory system, i is a global address, and Mem[i] to one processor is the same memory cell as Mem[i] to another processor. If both processors execute this instruction at the same time they will both load the same information into their R0 registers. In a distributed memory system, i is a local address. If two processors both execute load R0,i they may end up with different values in their R0 registers since Mem[i] designates two different memory cells, one in the local memory of each processor.

The distinction between shared memory and distributed memory is an important one for programmers because it determines how different parts of a parallel program will communicate. In a shared memory system it is only necessary to build a data structure in memory and pass references to the data structure to parallel subroutines. For example, a matrix multiplication routine that breaks matrices into quadrants only needs to pass the indices of each quadrant to the parallel subroutines. A distributed memory machine on the other hand must create copies of shared data in each local memory. These copies are created by sending a message containing the data to another processor. In the matrix multiplication example, the controlling process would have to send messages to three other processors. Each message would contain the submatrices required to compute one quadrant of the result. A drawback to this memory organization is that these messages might have to be quite large; in this example, half of each input matrix needs to be sent to each parallel subroutine.