To convert to the Level 3 BLAS involves column blocking

into blocks, where
is the * block size* and . The optimal choice of
depends on the
memory hierarchy of the machine in question: our approach is to compute
the * LU* decomposition
of each subblock of **A** using Algorithm 6.3
in the fast memory, and then use
Level 3 BLAS to update the rest of the matrix: