When this algorithm terminates, the diagonal and upper triangle of A contain U, the part below the diagonal contains the corresponding part of L (the diagonal of L is all ones, and so does not need to be stored. The permutation matrix P is determined implicitly by the permutations in the second line of the algorithm.
If A is stored by column, as is the case in Fortran (but not C), then since the inner loop combines rows of A, it accesses memory locations (at least) n locations apart. As described in Chapter CA,
, this does not respect locality, and so is likely to be slower than an algorithm which accesses A by column (i.e. accesses consecutive memory locations). Algorithm 6.1 is also called kij- LU decomposition, because of the nesting order of its loops. All the rest of the permutations of i, j and k lead to valid algorithms, some of which access columns of A in the innermost loop. The next algorithm is one of these, and is used in the LINPACK routine sgefa .