Finally, we consider rectangular blocking, where A is broken into an block matrix with blocks , and B and C are similarly partitioned. The algorithm becomes
Assuming so that one block each from A, B and C fit in memory simultaneously, our memory reference count is as follows: for reading and writing each block of C once, for reading A N times, and for reading B N times. This yields , which is much better than the previous algorithms. In  an analysis of this problem leads to an upper bound for q near , so we cannot expect to improve much on this algorithm for square matrices.
(See exercises 13 and 13.)