next up previous

4.3 Modified Amdahl's Law

Here, we modify Amdahl's Law [1] to describe the operations required to achieve the above vectorized events. As Monte Carlo processes are probabilistic, they result in random branching. This inhibits direct vectorization, as the particles which ``pass'' tests are generally located discontiguously in memory. On vector architectures, we must either: (1) pay the penalty of data motion to ``gather'' elements into contiguous locations in memory (and perhaps subsequently to ``scatter'' elements into their discontiguous original positions in memory) or (2) pay the penalty of full (100% - passed and failed) memory retrieval of elements, while masking-off elements which are not to have their values stored. On SIMD architectures, the penalty is incurred by conditionally masking-off individual processors where particles do not pass the test.

Suppose we are mapping this process to an architecture and desire to model the speedup from either: (1) scalar to vector or (2) single processor to multiprocessors. Here, the speedup is the ratio of: (1) scalar to vector or (2) single to multiprocessor execution times. Letting:

Then, the elapsed time to run the problem on a scalar processor will be:

where the elapsed time is simply the number of operations to be performed, W, divided by the scalar execution rate, S. Now, suppose that some fraction of this work, , can be done in the vector unit or in multiple processors. Additionally, suppose that the overhead associated with this is composed of two parts: (1) a fractional addition, F, due to vector start-up (or due to the spreading of the problem over multiple processors) and (2) a fractional addition, D, due to vectorized data motion (or due to inter-processor communication). Note that, these fractional additions are purely overhead, as the work done by the CPU is useless work. Then, the elapsed time to execute the problem on a parallel or vector machine is: