If we use **n** processors to apply this algorithm recursively instead of
splitting into just two systems, we can solve in steps, a
speedup of , but the efficiency decreases like .
This is theoretically attractive but inefficient. Because of the data
movement required, it is unlikely to be fast without system support
for this communication pattern.

A related approach, which avoids the two subsystems, is to eliminate
only the odd-numbered unknowns from the even-numbered equations **i**.
Again, this can be done in parallel, or in vector mode, and it results in
a new system in which only the even-numbered unknowns are coupled. After having
solved this reduced system, the odd-numbered unknowns can be computed in
parallel
from the odd-numbered equations. Of course, the trick can be repeated for the
subsystem of half size, and this process is known as * cyclic reduction*
[43,44].
Since the amount of serial work is halved in each step
by completely parallel (or vectorizable) operations, this approach has been
successfully applied on vector supercomputers, especially when the vector
speed of the machine is significantly greater than the scalar speed
[38,45,46].
For distributed memory computers
the method requires too much data movement for the reduced system to be
practical.

However, the method is easily generalized to one with more parallelism.
Cyclic reduction can be viewed as an approach in
which the given matrix **L** is written as a lower block bidiagonal matrix with
blocks along the diagonal. In the elimination process all
positions in the diagonal blocks are eliminated in parallel.
An obvious idea is to subdivide the matrix into larger blocks, i.e. we
write **L** as a block bidiagonal matrix with blocks along the
diagonal (for simplicity we assume that **n** is a multiple of **k**).
In practical cases **k** is chosen so large that the process is not repeated
for the resulting subsystems, as for cyclic reduction (where **k=2**). This
approach is referred to as a * divide-and-conquer* approach. For banded
triangular systems it was first suggested in [47],
for tridiagonal systems it was proposed in [48].