The major drawback in the distributed memory design is that interprocessor communication is more difficult. If a processor requires data from another processor's memory, it must exchange messages with the other processor. This introduces two sources of overhead: it takes time to construct and send a message from one processor to another, and a receiving processor must be interrupted in order to deal with messages from other processors.
Programming on a distributed memory machine is a matter of organizing a program as a set of independent tasks that communicate with each other via messages. In addition, programmers must be aware of where data is stored, which introduces the concept of locality in parallel algorithm design. An algorithm that allows data to be partitioned into discrete units and then runs with minimal communication between units will be more efficient than an algorithm that requires random access to global structures.
Semaphores, monitors, and other concurrent programming techniques are not directly applicable on distributed memory machines, but they can be implemented by a layered software approach. User code can invoke a semaphore, for example, which is itself implemented by passing a message to the node that ``owns'' the semaphore. This approach is not very efficient, however, and it has the drawback of nonuniform memory access, i.e. the latency of a memory request, in this case reading the value of a semaphore, is proportional to the distance between the processor making the request and the memory where the value is stored.