To convert to the Level 3 BLAS involves column blocking
into blocks, where
is the block size and
. The optimal choice of
depends on the
memory hierarchy of the machine in question: our approach is to compute
the LU decomposition
of each
subblock of A using Algorithm 6.3
in the fast memory, and then use
Level 3 BLAS to update the rest of the matrix: