6.1 Gaussian Elimination on Shared Memory Machines     continued...

To convert to the Level 3 BLAS involves column blocking

into blocks, where is the block size and . The optimal choice of depends on the memory hierarchy of the machine in question: our approach is to compute the LU decomposition of each subblock of A using Algorithm 6.3 in the fast memory, and then use Level 3 BLAS to update the rest of the matrix: