The next possibility is dividing **B** and **C** into * column blocks*,
and computing **C** block by block. Recall that we use the notation
to mean the submatrix in rows **i** through **j** and columns
**k** through **l**. We partition
where each
is ,
and similarly for **C**. Our column block algorithm is then

Assuming , so that fast
memory can accommodate , and one column of **A**
simultaneously, our memory reference count is as follows:
for reading and writing each block
of **C** once, for reading each block of **B** once, and for
reading **A** **N** times. This yields , so that **M** needs to
grow with **n** to keep **q** large.