To illustrate, let us apply one parallel elimination step to the lower bidiagonal system Lx=b to eliminate all subdiagonal elements in all diagonal blocks. This yields a system , where for k=4 and n=16 we get
There are two possibilities for the next step. In the original approach , the fill-in in the subdiagonal blocks is eliminated in parallel, or vector mode, for each subdiagonal block (note that each subdiagonal block has only one column with nonzero elements). It has been shown in  that this leads to very efficient vectorized code for machines such as Cray, Fujitsu, etc.