The other techniques we will discuss can all be applied to general banded
systems, for which most were originally proposed, but for ease of exposition
we will illustrate them just with a lower unit bidiagonal system Lx=b.
A straight forward parallelization approach is to eliminate
the unknown from equation i using
equation i-1, for all i in parallel. This leads to a new system in which
each
is coupled only with
. Thus, the original system splits
in two independent lower bidiagonal systems of half the size,
one for the odd-numbered
unknowns, and one for the even-numbered unknowns. This process
can be repeated recursively for both new systems,
leading to an algorithm known as recursive doubling [41].
It has been analyzed and generalized for banded systems in [42]. Its significance for modern parallel computers is limited, which we illustrate with the following examples.
Suppose we perform a single step of recursive doubling. This step can be done in parallel, but it involves slightly more arithmetic than the serial elimination process for solving Lx=b. The two resulting lower bidiagonal systems can be solved in parallel. This implies that on a two-processor system the time for a single step of recursive doubling will be slightly more than the time for solving the original system with only one processor. If we have n processors (where n is the dimension of L), then the elimination step can be done in very few time steps, and the two resulting systems can be solved in parallel, so that we have a speedup of about 2. However, this is not very practical, since during most of the time n-2 processors are idle, or formulated differently, the efficiency of the processors is rather low.