next up previous

4.1 Matrix Multiplication on a Shared Memory Machine     continued...

The next possibility is dividing B and C into column blocks, and computing C block by block. Recall that we use the notation to mean the submatrix in rows i through j and columns k through l. We partition where each is , and similarly for C. Our column block algorithm is then

Assuming , so that fast memory can accommodate , and one column of A simultaneously, our memory reference count is as follows: for reading and writing each block of C once, for reading each block of B once, and for reading A N times. This yields , so that M needs to grow with n to keep q large.