next up previous

4.3 Modified Amdahl's Law     continued...

Then, dividing the above two equations yields R the speedup ratio:

Note, as F and D appear in conjunction, we can combine their effects into one. Thus, hereafter, we let D represent the sum of F and D. Moreover, it is generally possible to select long enough vector lengths so that F is negligible; however, data motion, D, is always significant. In effect, we pay the overhead of data motion (useless work ``gathering'' data elements into contiguous memory locations), so that we can perform subsequent operations in the much faster vector hardware.

In Figure 21, we depict R versus for (representative of Cray Hardware), with D varying parametrically.

Figure 21: Modified Amdahl's Law.

Several things are noteworthy from the graph:

If we fix (nominally fixed by the architecture), we can consider equation (1) to contain two independent parameters, and D. Therefore, if we measure R for two different architectures with known but different values of , we can determine both and D.

Burns et al. [3] have performed such an experiment on a Cray Y/MP and an ETA 10-G for the Monte Carlo simulation GAMTEB---one of Los Alamos' benchmark programs [2]. We measured (by turning vectorization ``on'' and ``off'' for critical loops) values of V/S of 12 and 25 for the Cray Y/MP and the ETA 10-G, respectively.