next up previous

4.1 Matrix Multiplication on a Shared Memory Machine     continued...

Finally, we consider rectangular blocking, where A is broken into an block matrix with blocks , and B and C are similarly partitioned. The algorithm becomes

Assuming so that one block each from A, B and C fit in memory simultaneously, our memory reference count is as follows: for reading and writing each block of C once, for reading A N times, and for reading B N times. This yields , which is much better than the previous algorithms. In [15] an analysis of this problem leads to an upper bound for q near , so we cannot expect to improve much on this algorithm for square matrices.

(See exercises 13 and 13.)