3.6 Performance Models for Vector and Parallel Machines     continued...

This version of the equation makes the contribution of the sequential portion of the computation more apparent. Here a is the fraction of the program that can be performed in parallel, and thus is the portion that is sequential. The denominator is the time to execute the program in parallel, i.e. the sum of the time spent in the sequential portion and the time spent in the parallel part, where the parallel time is a function of the speedup factor of the parallel portion. If the parallel portion exhibits perfect speedup, i.e. a factor of P when run on P processors, the equation becomes:

The efficiency of a parallel computation is the ratio of the speedup to the number of processors used to obtain that speedup:

For example, if 10 processors are used and the program runs 10 times faster, we are running at the maximum possible speed, i.e. all processors are being used to their full capacity. If the speedup is only a factor of 5, however, the efficiency is , i.e. half the computing power is lost in overhead or synchronization.

The above models do not try to characterize the execution of a parallel program. They simply measure the time required to execute a program on a given machine, and compare that time to sequential execution times. A simple model that breaks a parallel program into constituent parts is

The three components of overall execution time are , the computation time, , the time spent in communication, and , the time used to synchronize the processors at appropriate points in the algorithm. This type of analysis is very important for algorithms that will run on distributed memory machines, where locality and communication costs will play a major role in efficiency.