7 Gaussian Elimination on Band Matrices     continued...

If we use n processors to apply this algorithm recursively instead of splitting into just two systems, we can solve in steps, a speedup of , but the efficiency decreases like . This is theoretically attractive but inefficient. Because of the data movement required, it is unlikely to be fast without system support for this communication pattern.

A related approach, which avoids the two subsystems, is to eliminate only the odd-numbered unknowns from the even-numbered equations i. Again, this can be done in parallel, or in vector mode, and it results in a new system in which only the even-numbered unknowns are coupled. After having solved this reduced system, the odd-numbered unknowns can be computed in parallel from the odd-numbered equations. Of course, the trick can be repeated for the subsystem of half size, and this process is known as cyclic reduction [43,44]. Since the amount of serial work is halved in each step by completely parallel (or vectorizable) operations, this approach has been successfully applied on vector supercomputers, especially when the vector speed of the machine is significantly greater than the scalar speed [38,45,46]. For distributed memory computers the method requires too much data movement for the reduced system to be practical.

However, the method is easily generalized to one with more parallelism. Cyclic reduction can be viewed as an approach in which the given matrix L is written as a lower block bidiagonal matrix with blocks along the diagonal. In the elimination process all positions in the diagonal blocks are eliminated in parallel. An obvious idea is to subdivide the matrix into larger blocks, i.e. we write L as a block bidiagonal matrix with blocks along the diagonal (for simplicity we assume that n is a multiple of k). In practical cases k is chosen so large that the process is not repeated for the resulting subsystems, as for cyclic reduction (where k=2). This approach is referred to as a divide-and-conquer approach. For banded triangular systems it was first suggested in [47], for tridiagonal systems it was proposed in [48].