The next possibility is dividing B and C into column blocks,
and computing C block by block. Recall that we use the notation
to mean the submatrix in rows i through j and columns
k through l. We partition
where each
is
,
and similarly for C. Our column block algorithm is then
Assuming , so that fast
memory can accommodate
,
and one column of A
simultaneously, our memory reference count is as follows:
for reading and writing each block
of C once,
for reading each block of B once, and
for
reading A N times. This yields
, so that M needs to
grow with n to keep q large.