Consider the basic linear algebra subroutine which
solves **TX=B** for **X**, where **T** is a given **n**-by-**n** triangular matrix,
**B** is a given **n**-by-**m** matrix, and **X** is an **n**-by-**m** matrix of
unknowns. Give blocked implementations for this subroutine analogous to
the ones for matrix-multiplication above, and compare their
ratios **q** of flops to memory references. How do your answers depend on
**m**?