To illustrate, let us apply one parallel elimination step to the
lower bidiagonal system **Lx=b** to eliminate all
subdiagonal elements in all diagonal blocks.
This yields a system , where for **k=4** and **n=16**
we get

There are two possibilities for the next step. In the original approach [48], the fill-in in the subdiagonal blocks is eliminated in parallel, or vector mode, for each subdiagonal block (note that each subdiagonal block has only one column with nonzero elements). It has been shown in [49] that this leads to very efficient vectorized code for machines such as Cray, Fujitsu, etc.