Finally, we consider * rectangular blocking*, where **A** is broken into
an block matrix with blocks ,
and **B** and **C** are similarly partitioned. The algorithm becomes

Assuming so that one block each from **A**, **B** and **C** fit
in memory simultaneously, our memory reference count is as follows:
for reading and writing each block of **C** once, for reading
**A** **N** times, and for reading **B** **N** times. This yields
, which is much better than the previous algorithms.
In [15] an analysis of this problem leads to an upper bound
for **q** near , so we cannot expect to improve much on this
algorithm for square matrices.