Let us examine the BLAS more carefully. For a more complete list of the BLAS, see section 12. Table 2 counts the number of memory references and floating points operations performed by three related BLAS. The last column gives the ratio q of flops to memory references. The significance of q is that it tells us roughly how many flops we can perform per memory reference, or how much useful work we can do compared to the time moving data; therefore, the algorithms with the larger q values are better building blocks for other algorithms.
Table 2 reflects a hierarchy of operations: Operations like saxpy
operate on vectors and offer the worst q values; these are called Level 1
BLAS [10], and include inner products and other simple operations.
Operations like matrix--vector multiplication operate on matrices and vectors,
and offer slightly better q values; these are called Level 2 BLAS
[11], and include solving triangular systems of equations and
rank-1 updates of matrices (, x and y column vectors). Operations
like matrix--matrix multiplication operate on pairs of matrices, and
offer the best q values; these are called Level 3 BLAS [12], and
include solving triangular systems of equations with many right hand sides.
Table 2: Basic Linear Algebra Subroutines (BLAS).
Since the Level 3 BLAS have the highest q values, we endeavor to reorganize our algorithms in terms of operations like matrix-matrix multiplication, rather than saxpy (the LINPACK Cholesky is already constructed in terms of calls to saxpy).