When this algorithm terminates, the diagonal and upper triangle of A contain U, the part below the diagonal contains the corresponding part of L (the diagonal of L is all ones, and so does not need to be stored. The permutation matrix P is determined implicitly by the permutations in the second line of the algorithm.
If A is stored by column, as is the case in Fortran (but not C), then since the inner loop combines rows of A, it accesses memory locations (at least) n locations apart. As described in Chapter CA,
,
this does not respect locality, and
so is likely to be slower than an algorithm which accesses A by
column (i.e. accesses consecutive memory locations).
Algorithm 6.1 is
also called kij- LU decomposition, because of the
nesting order of its
loops. All the rest of the
permutations of i, j and k lead to
valid algorithms, some of which access columns of A in the innermost loop.
The next algorithm is one of these,
and is used in the LINPACK routine sgefa [1].