next up previous

3.3.2 Shared Memory     continued...

A slight modification to this design will improve performance, but it cannot indefinitely postpone the flattening of the performance curve. If each processor has its own local cache, there is a high probability () that the instruction or data it wants is in the local cache. A reasonable cache hit rate will greatly reduce the number of accesses a processor makes and thus improve overall efficiency. The ``knee'' of the performance curve, which identifies a point where it is still cost-effective to add processors, can now be around 20 processors, and the curve will not flatten out until around 30 processors.

Giving each processor its own cache introduces a difficulty known as the cache coherency problem. In its simplest form, the problem may be exemplified by the following scenario. Suppose two processors use data item A, so A ends up in the cache of both processors. Next suppose processor 1 performs a calculation that changes A. When it is done, the new value of A is written out to main memory.gif Processor 2 at a later time needs to fetch A. However, since A was already in its cache, it will use the cached value and not the newly updated value calculated by processor 1. Maintaining a consistent version of shared data requires providing new versions of the cached data to each processor whenever one of the processors updates its copy.