Then, dividing the above two equations yields R the speedup ratio:
Note, as F and D appear in conjunction, we can combine their effects into one. Thus, hereafter, we let D represent the sum of F and D. Moreover, it is generally possible to select long enough vector lengths so that F is negligible; however, data motion, D, is always significant. In effect, we pay the overhead of data motion (useless work ``gathering'' data elements into contiguous memory locations), so that we can perform subsequent operations in the much faster vector hardware.
In
Figure 21,
we depict R versus for
(representative of Cray Hardware), with D varying parametrically.
Figure 21: Modified Amdahl's Law.
Several things are noteworthy from the graph:
If we fix (nominally fixed by the architecture), we can consider equation
(1) to contain two independent parameters,
and D.
Therefore, if we measure R for two different architectures with known but
different values of
, we can determine both
and D.
Burns et al. [3] have performed such an experiment on a Cray Y/MP and an ETA 10-G for the Monte Carlo simulation GAMTEB---one of Los Alamos' benchmark programs [2]. We measured (by turning vectorization ``on'' and ``off'' for critical loops) values of V/S of 12 and 25 for the Cray Y/MP and the ETA 10-G, respectively.