next up previous

Exercise 1: Confirm the values of q computed for the above three algorithms.

On the other hand, this brief analysis ignores a number of practical issues:

(1)
high level language constructs do not yet support block layout of matrices as described here,
(2)
if the fast memory consists of vector registers and has vector operations supporting saxpy but not inner products, a column blocked code may be superior;
(3)
a real code will have to deal with nonsquare matrices, for which the optimal block sizes may not be square.

This is why computer manufacturers are often in the best position to write the matrix-multiplication (and other BLAS) for their machines, because they often have the best understanding of these machine specific details.gif

Another quite different algorithm is Strassen's method [17], which multiplies matrices recursively by dividing them into block matrices, and multiplying the subblocks using 7 matrix multiplications (recursively) and 15 matrix additions of half the size. This leads to an asymptotic complexity of instead of . The value of this algorithm is not just this asymptotic complexity but its reduction of the problem to smaller subproblems which eventually fit in fast memory; once the subproblems fit in fast memory standard matrix multiplication may be used. This approach has led to speedups on relatively large matrices on some machines [18]. A drawback is the need for significant workspace, and somewhat lower numerical stability, although it is adequate for many purposes [19].