In a distributed memory system the memory is associated with individual processors and a processor is only able to address its own memory. Some authors refer to this type of system as a multicomputer, reflecting the fact that the building blocks in the system are themselves small computer systems complete with processor and memory.
There are several benefits of this organization. First, there is no bus or switch contention. Each processor can utilize the full bandwidth to its own local memory without interference from other processors. Second, the lack of a common bus means there is no inherent limit to the number of processors; the size of the system is now constrained only by the network used to connect processors to each other. Third, there are no cache coherency problems. Each processor is in charge of its own data, and it does not have to worry about putting copies of it in its own local cache and having another processor reference the original.
The major drawback in the distributed memory design is that interprocessor communication is more difficult. If a processor requires data from another processor's memory, it must exchange messages with the other processor. This introduces two sources of overhead: it takes time to construct and send a message from one processor to another, and a receiving processor must be interrupted in order to deal with messages from other processors.
Programming on a distributed memory machine is a matter of organizing a program as a set of independent tasks that communicate with each other via messages. In addition, programmers must be aware of where data is stored, which introduces the concept of locality in parallel algorithm design. An algorithm that allows data to be partitioned into discrete units and then runs with minimal communication between units will be more efficient than an algorithm that requires random access to global structures.
Semaphores, monitors, and other concurrent programming techniques are not directly applicable on distributed memory machines, but they can be implemented by a layered software approach. User code can invoke a semaphore, for example, which is itself implemented by passing a message to the node that ``owns'' the semaphore. This approach is not very efficient, however, and it has the drawback of nonuniform memory access, i.e. the latency of a memory request, in this case reading the value of a semaphore, is proportional to the distance between the processor making the request and the memory where the value is stored.
Which programming style is easier - shared memory with semaphores, etc. or distributed memory with message passing - is often a matter of personal preference. The message passing style fits very well with the object oriented programming methodology, and if a program is already organized in terms of objects it may be quite easy to adapt it for a distributed memory system. When faced with a decision of whether to implement a program in shared memory or distributed memory the outcome is usually based on the amount of information that must be shared by parallel tasks. Whatever information is shared among tasks must be copied from one node to another via messages in a distributed memory system, and this overhead may reduce efficiency to the point where a shared memory system is preferred.
A PMS diagram of a simple distributed memory parallel processor is shown in Figure 9. On the left is the diagram of a single node, often called a processing element, or PE. The organization of a PE explains how messages are passed from one PE to another. As far as any one processor is concerned, the other processors are simply I/O devices. To send a message to another PE, a processor copies information into a data block in its local memory and then tells its local controller to transfer the information to an external device, much the same way a disk controller in a microcomputer would write a block on a disk drive. In this case, however, the block of data is transferred over the interconnection network to an I/O controller in the receiving node. That controller finds room for the incoming message in its local memory and then notifies the processor that a message has arrived.