Here, we modify Amdahl's Law  to describe the operations required to achieve the above vectorized events. As Monte Carlo processes are probabilistic, they result in random branching. This inhibits direct vectorization, as the particles which ``pass'' tests are generally located discontiguously in memory. On vector architectures, we must either: (1) pay the penalty of data motion to ``gather'' elements into contiguous locations in memory (and perhaps subsequently to ``scatter'' elements into their discontiguous original positions in memory) or (2) pay the penalty of full (100% - passed and failed) memory retrieval of elements, while masking-off elements which are not to have their values stored. On SIMD architectures, the penalty is incurred by conditionally masking-off individual processors where particles do not pass the test.
Suppose we are mapping this process to an architecture and desire to model the speedup from either: (1) scalar to vector or (2) single processor to multiprocessors. Here, the speedup is the ratio of: (1) scalar to vector or (2) single to multiprocessor execution times. Letting:
Then, the elapsed time to run the problem on a scalar processor will be:
where the elapsed time is simply the number of operations to be performed, W, divided by the scalar execution rate, S. Now, suppose that some fraction of this work, , can be done in the vector unit or in multiple processors. Additionally, suppose that the overhead associated with this is composed of two parts: (1) a fractional addition, F, due to vector start-up (or due to the spreading of the problem over multiple processors) and (2) a fractional addition, D, due to vectorized data motion (or due to inter-processor communication). Note that, these fractional additions are purely overhead, as the work done by the CPU is useless work. Then, the elapsed time to execute the problem on a parallel or vector machine is: