3 The BLAS

Let us examine the BLAS more carefully. For a more complete list of the BLAS, see section 12. Table 2 counts the number of memory references and floating points operations performed by three related BLAS. The last column gives the ratio q of flops to memory references. The significance of q is that it tells us roughly how many flops we can perform per memory reference, or how much useful work we can do compared to the time moving data; therefore, the algorithms with the larger q values are better building blocks for other algorithms.

Table 2 reflects a hierarchy of operations: Operations like saxpy operate on vectors and offer the worst q values; these are called Level 1 BLAS [10], and include inner products and other simple operations. Operations like matrix--vector multiplication operate on matrices and vectors, and offer slightly better q values; these are called Level 2 BLAS [11], and include solving triangular systems of equations and rank-1 updates of matrices (, x and y column vectors). Operations like matrix--matrix multiplication operate on pairs of matrices, and offer the best q values; these are called Level 3 BLAS [12], and include solving triangular systems of equations with many right hand sides.

Since the Level 3 BLAS have the highest q values, we endeavor to reorganize our algorithms in terms of operations like matrix-matrix multiplication, rather than saxpy (the LINPACK Cholesky is already constructed in terms of calls to saxpy).