So far the discussion of high performance computing has concentrated on increasing the amount of processing power in a system, either through parallelism, which seeks to increase the number of instructions that can be executed in a time period, or through pipelining, which improves the instruction throughput. Another, equally important, aspect of high performance computing is the organization of the memory system. No matter how fast one makes the processing unit, if the memory cannot keep up and provide instructions and data at a sufficient rate there will be no improvement in performance. The main problem that needs to be overcome in matching memory response to processor speed is the memory cycle time, defined in section 2.2 to be the time between two successive memory operations. Processor cycle times are typically much shorter than memory cycle times. When a processor initiates a memory transfer at time , the memory will be ``busy'' until , where is the memory cycle time. During this period no other device --- I/O controller, other processors, or even the processor that makes the request --- can use the memory since it will be busy responding to the request.
Solutions to the memory access problem have led to a dichotomy in parallel systems. In one type of system, known as a shared memory system, there is one large virtual memory, and all processors have equal access to data and instructions in this memory. The other type of system is a distributed memory, in which each processor has a local memory that is not accessible from any other processor.