6.1 Gaussian Elimination on Shared Memory Machines



next up previous
Next: 6.2 Gaussian Elimination on Up: 6 Gaussian Elimination on Previous: 6 Gaussian Elimination on

6.1 Gaussian Elimination on Shared Memory Machines

 

The innermost loop of Algorithm 6.2 can be performed by a single call to the Level 1 BLAS operation saxpy; this in done in LINPACK. To achieve higher performance, we modify this code first to use the Level 2 and then the Level 3 BLAS in its innermost loops. Again, 3! versions of these algorithms are possible, but we just describe the ones used in the LAPACK library [2]. There is obvious parallelism in the innermost loop, since each can be updated independently. To make the use of BLAS clear, we use Fortran 90 (or Matlab) notation:

 

The parallelism is evident: most work is performed is a single rank-1 update of the trailing submatrix , where each entry of can be updated in parallel. Other permutations of the nested loops lead to different algorithms, which depend on the BLAS for matrix-vector multiplication and solving a triangular system instead of rank-1 updating [30][29]; which is faster depends on the relative speed of these on a given machine.

To convert to the Level 3 BLAS involves column blocking

into blocks, where is the block size and . The optimal choice of depends on the memory hierarchy of the machine in question: our approach is to compute the LU decomposition of each subblock of using Algorithm 6.3 in the fast memory, and then use Level 3 BLAS to update the rest of the matrix:

 

Most of the work is performed in the last two lines, solving a triangular system with many right-hand sides, and matrix multiplication. Other similar algorithms may be derived by conformally partitioning , and , and equating partitions in . Algorithms 6.3 and 6.4 are available as, respectively, subroutines sgetf2 and sgetrf in LAPACK [2].

(See exercise 9.)

The LAPACK routine implementing Cholesky is sposv. Other matrix structures handled by LAPACK include

There are also double precision, complex, and double precision complex counterparts of all LAPACK routines (replace the leading 's' in the subroutine name by 'd', 'c', and 'z', respectively).



next up previous
Next: 6.2 Gaussian Elimination on Up: 6 Gaussian Elimination on Previous: 6 Gaussian Elimination on



verena@csep1.phy.ornl.gov