As mentioned in the beginning of this section, there is at least limited potential for vectorization of a single lagged Fibonacci generator. Figure 12 is an illustration of vectorization applied to our generator. As the figure shows, the vector algorithm advances the register ahead by steps, so that the vector length of most of the operations is . Note that there is a vector copy operation of length . Care should be taken that no item of data is destroyed before it is needed. The easiest way to prevent unintentionally writing over needed data is to keep two copies of the Fibonacci register and, for each ``vector'' advance, use the old copy to construct the new one. None of the data in the old copy will be destroyed until the next vector advance, when it becomes the new copy. If vectorization of the Fibonacci generator is important --- and it could be, if random number generation consumes a large fraction of the execution time --- then clearly a long vector length is better than a short one. Processing with a vector length of 6, as our example has, would not yield much improvement over the scalar method. For vectorization to provide meaningful improvement over scalar processing, the vector operations should be long enough to make good use of the machine hardware. For example, on Cray machines where the vector registers are 64 words long (128 on the new models), this usually means vector lengths of tens of elements. For these machines, the generators and would be good choices, with respective vector lengths of 64 and 127.
In Figure 13 we list a sample Fortran code for initializing and generating random numbers from the generator . Note that the register is maintained as a set of 32-bit numbers, but that the number returned to the user has only 31 bits. The initialization of the register is accomplished using the Park and Miller LGC described in . The seed, `` iseed0,'' supplied by the user may be any integer greater than or equal to zero and less than or equal to = 2,147,483,646. The register is initialized in canonical form, so each value of iseed0 results in a distinct cycle of random numbers. Since the function irnd175() was written to work on 64-bit machines, as well as 32-bit machines, the mask operations were included to add clarity to the code. In many situations, the 32-bit mask operation could be eliminated, since the hardware would simply ignore any overflow. The 31-bit mask could also be eliminated on any systems that zero-fill on right shift operations. If the system performs a ``sign extension'' type of fill, then the 31-bit mask would be required.
Figure 13: FORTRAN implementation of LFG(17,5,32).
(See exercise 8, 8, and 8.)