Lagged Fibonacci pseudo-random number generators have become increasingly popular in recent years. These generators are so named because of their similarity to the familiar Fibonacci sequence:
where the first two values, and , must be supplied. This formula is generalized to give a family of pseudo-random number generators of the form:
and where, instead of two initial values, initial values, , ... , , are needed in order to compute the next sequence element. In this expression the ``lags'' are and , so that the current value of is determined by the value of places ago and places ago. In addition, for most applications of interest is a power of two. That is, . (This type of pseudo-random number generator, along with several others, has been extensively tested for randomness properties by Marsaglia  and has been given high marks. The only deficiency found was related to what he calls the Birthday Spacings test, for low values of and . The interested reader is referred to Marsaglia, 1985.)
With proper choice of , , and the first values of , the period, , of this generator is equal to . Proper choice of and here means that the trinomial is primitive over the integers mod 2. The only condition on the first values is that at least one of them must be odd. Using the notation to indicate the lags and the power of two modulus, examples of two commonly used versions of these generators are:
Obviously, the value of the modulus, , does not by itself limit the period of the generator, as it does in the case of an LCG. Note also that lagged Fibonacci pseudo-random number generation is computationally simple: an integer add, a logical AND (to accomplish the mod operation), and the decrementing of two array pointers are the only operations required to produce a new random number . Furthermore, with a large enough value of , limited vectorization can be achieved. The major drawback in the case of this type of generator is the fact that words of memory must be kept current. An LCG requires only one: the last value of generated.
We now look at some of the theory of these generators, with a view toward ensuring their proper use. Conceptually, a Fibonacci generator acts the same as a linear shift register, and if we set so that , then we have a binary linear shift register. Figure 8 is a diagram of a particular binary linear shift register, where , , and every is either a 1 or a 0. (We will use this choice of and in several examples in order to illustrate in a non-trivial way some of the properties of this type of generator.) The arrows are there to depict the motion of the shift register as it makes the transition from one state to the next. This is called advancing the register. Two such advances are shown in Figure 9.
If we set equal to a higher power of two, say , i.e., , then the single-bit values of the 's in Figure 8 will instead be represented by bits each. Figure 10 is a diagram of the mod-16 linear shift register associated with the equation
In looking at this more general Fibonacci generator, , two things are worth noting:
This is no surprise - when we add two numbers, the answer in the one's place does not depend in any way on the digits in the ten's place (here, the two's place).
Since the state transformation of the shift register contents is a linear operation, a matrix equation describing it can be given. Continuing with the example of the 10-long generator, if we define and by:
then the action of the shift register can be readily described by the equation
where is the entire vector after time steps. If the vector has been given some initial set of values, then we have,
etc., and in general,
As an aside, note that , or , the identity matrix.
The above equation gives a way to leap the generator ahead, similar in fashion to the leap ahead concept discussed for LCGs. But such an operation with a Fibonacci generator requires a matrix-vector multiply involving, at best, the precomputation of possibly many multiples of . This precomputation may be expensive, even for small values of (recall that is ), and may be prohibitive for large values of . Leaping a Fibonacci generator ahead is therefore not recommended. This does not mean that such a generator cannot be ``split,'' however. We now look at an efficient method for splitting a Fibonacci generator.
Before explaining the splitting technique, notice that the period, , of a properly constructed Fibonacci generator is . Consider a given initial set of values of an -bit, -long Fibonacci register. This state is a particular bit pattern in the rectangular register. If the register is advanced times, the initial pattern will be replaced by different patterns before the initial one reappears. But the number of possible bit patterns in the register is , a number far larger than . This tells us that there are many -long cycles that are independent of one other and can be generated from the same structure. The number of such full-period cycles is . For example, with the generator of our previous example, there are separate full-period cycles and a much smaller number of less than full-period cycles (, to be precise).
The question now becomes, how do we initialize separate cycles? This problem is addressed by Mascagni, et al. , where they describe a canonical form for initializing Fibonacci generators. This canonical form is determined by and , but is independent of . To understand the use of this canonical form, consider a second view of the register shown in Figure 11. The L-shaped region along the left column and the bottom row is fixed with all zeros, except for a one in the least significant bit of word 7 (the word associated with ). The remaining bits, those in the -bit rectangular region filled with z's, are free to be chosen in any combination. Each combination of bits in the area will generate a distinct cycle of pseudo-random numbers. In other words, every possible bit pattern that can be put in the canonical form will occur in one and only one full period cycle.
In general, the canonical form for initializing Fibonacci generators requires that word be set to all zero bits and that the least significant bits of all words in the register to be set to zero, with the exception of one or two characteristic bits that depend on and . As shown in Figure 11, the canonical form for the generator requires the least significant bit of word 7 to be set to one, with all other least significant bits in the register set to zero. Table 4 lists, for several choices of , the characteristic word (or words) for which the least significant bit should be set to one in order to be in canonical form.
Two caveats with respect to the use of this canonical form should be mentioned at this point. Suppose a user were to construct and initialize a number of generators using an initialization scheme that simply numbered the initial states as one would number the integers in a normal binary representation. This method certainly would not violate the conditions under which the canonical form produces distinct cycles, and indeed the cycles of random numbers generated from each initial state would be different. But the first few random numbers from these cycles may be less than random-looking when compared both with corresponding output from other generators and with successive values from the same generator. As an example, consider the generator, where we canonically initialize one register with all zeros in the z-ed region of Figure 11 (call this generator ``A'') and a second generator with a single one in the lower right corner of the z-ed region (call this generator ``B''). Table 5 lists the first 100 pseudo-random numbers produced by both of these generators, both as integers in the range [0,15] and as floating point numbers in the range [0.0,1.0). Note that the A sequence consists of low numbers (less than 0.5) for 43 iterations and is actually either zero or one for the first 19 iterations. B is somewhat better, but still not very satisfying. In addition, A and B appear, at least until about 50 iterations, to be correlated. This appearance of non-random behavior on the part of these two sample sequences does not mean that Fibonacci sequences or this method of initializing them does not produce high quality random numbers. ``Flat spots'' are to be expected in any good pseudo-random number sequence of sufficient length, and indeed such a sequence without them would be suspect with regard to its randomness. The problem with initializing A and B in the example is that the flat spots were ``lined up'' with each other and were placed at the very beginning. Neither of these results would be a problem if our application required the use of thousands of numbers from both the A and the B sequences and if the numbers generated at the beginning were no more important than those generated later. After a short time, the two sequences would look completely uncorrelated from each other and would appear internally random as well. However, if the number of required pseudo-random numbers is small, the outcome of the numerical experiment might be unexpectedly skewed due to the initial conditions.
View First Part of Table
View Second Part of Table
The problem just demonstrated is not difficult to fix. We simply need a different way of ``numbering'' the patterns in the area of the free bits in the canonical form, one that randomizes these patterns to some extent. An LCG can be used to initialize the free bits, if one is careful to use it in such a way that a unique bit pattern results for each distinct cycle number number. Pryor et al.  describe the use of the LCG of Park and Miller  as a good method for initializing their family of Fibonacci generators. In that family, is equal to 32, so that the free bit rectangle is 31 bits high; whereas the Park and Miller generator uses a modulus of the prime number . Thus the successive values in the 31 high bits of the initial state for word 0 through word are simply the LCG successors of . This gives a simple way to initialize distinct cycles that start out with a satisfactory ``look and feel.'' As an aside, this particular LCG passes the `bitwise'' randomness tests described by Altman , for use in initializing Fibonacci generators.
The second caveat related to the use of this canonical form is also highlighted by the previous example. The astute reader may have noticed that with this method of initializing the Fibonacci register, for every advance, the least significant bit is the same for all generators. Unfortunately, this is the tradeoff vs. efficiency that was made in order to guarantee uniqueness of the cycles: the least significant bit is simply non-random relative to the other cycles. For all other bit fields, including the next-to-least significant, there is no such defect. Several possibilities arise for remedying this situation, including the use of a separate, independent generator for only the least significant bit. However, in the interest of simplicity and speed, Pryor et al. have chosen to shift off the least significant bit of the generated random number, so that the numbers returned by the generator are in the range rather than . The register is initialized and maintained as a 32-bit generator, so that no loss of period length or uniqueness of cycle is incurred. And for 32-bit machines, the returned results still include all positive integers.
As mentioned in the beginning of this section, there is at least limited potential for vectorization of a single lagged Fibonacci generator. Figure 12 is an illustration of vectorization applied to our generator. As the figure shows, the vector algorithm advances the register ahead by steps, so that the vector length of most of the operations is . Note that there is a vector copy operation of length . Care should be taken that no item of data is destroyed before it is needed. The easiest way to prevent unintentionally writing over needed data is to keep two copies of the Fibonacci register and, for each ``vector'' advance, use the old copy to construct the new one. None of the data in the old copy will be destroyed until the next vector advance, when it becomes the new copy. If vectorization of the Fibonacci generator is important - and it could be, if random number generation consumes a large fraction of the execution time - then clearly a long vector length is better than a short one. Processing with a vector length of 6, as our example has, would not yield much improvement over the scalar method. For vectorization to provide meaningful improvement over scalar processing, the vector operations should be long enough to make good use of the machine hardware. For example, on Cray machines where the vector registers are 64 words long (128 on the new models), this usually means vector lengths of tens of elements. For these machines, the generators and would be good choices, with respective vector lengths of 64 and 127.
In Figure 13 we list a sample Fortran code for initializing and generating random numbers from the generator . Note that the register is maintained as a set of 32-bit numbers, but that the number returned to the user has only 31 bits. The initialization of the register is accomplished using the Park and Miller LGC described in . The seed, ``iseed0,'' supplied by the user may be any integer greater than or equal to zero and less than or equal to = 2,147,483,646. The register is initialized in canonical form, so each value of iseed0 results in a distinct cycle of random numbers. Since the function irnd175() was written to work on 64-bit machines, as well as 32-bit machines, the mask operations were included to add clarity to the code. In many situations, the 32-bit mask operation could be eliminated, since the hardware would simply ignore any overflow. The 31-bit mask could also be eliminated on any systems that zero-fill on right shift operations. If the system performs a ``sign extension'' type of fill, then the 31-bit mask would be required.
(See exercise 10, 11, and 12.)