next up previous

Exercise 2: A comparison of blocked implementations for the basic linear algebra subroutine and the matrix-multiplication implemention, previously provided.

Consider the basic linear algebra subroutine which solves TX=B for X, where T is a given n-by-n triangular matrix, B is a given n-by-m matrix, and X is an n-by-m matrix of unknowns. Give blocked implementations for this subroutine analogous to the ones for matrix-multiplication above, and compare their ratios q of flops to memory references. How do your answers depend on m?