Since the three standard problems listed above have been studied for so long, one might hope that good numerical software libraries would be available to solve them. This is partly true, as we illustrated by LAPACK [2,7]. However, there are several reasons one cannot depend on libraries entirely. First, it is impossible to anticipate all possible structures of A that can arise in practice and write corresponding libraries (although extensive libraries for various specialized A do exist ). Second, high performance computer architectures, programming languages and compilers have been changing so much lately that library writers have not been able to keep up. There is no clear cut rule on when to use a library routine. In some cases an unoptimized or serial implementation of an algorithm which fully exploits a matrix structure may be much faster than a highly tuned, parallel implementation of a more general algorithm, and in other cases the opposite may be true. If the user's priority is ease of programming and reliability, a library routine is a good choice. If, instead, the user's priority is maximum performance on one specialized problem, and software development time is secondary, custom software is the likely choice. We will try to address the concerns of both classes of users in this chapter.
The rest of this chapter is organized as follows. Section 2 discusses the cost models we will use to understand algorithm performance. Section 3 discusses the BLAS, which we use to construct many high performance algorithms. Section 4 discusses how to implement matrix multiplication quickly on several different parallel architectures; these techniques serve as models for implementing more complicated algorithms. Section 5 discusses how to layout matrices on distributed memory machines. Section 6 discusses parallelizing Gaussian elimination on dense matrices, and gives pointers to other related algorithms. Section 7 discusses parallelizing Gaussian elimination on dense matrices, and gives pointers to other related algorithms. Section 8 discusses parallelizing Gaussian elimination on sparse matrices; this is a much harder problem, and much less software is available. Section 9 discusses iterative methods for solving Ax=b, which in contrast to Gaussian elimination take an approximate solution and iteratively improves it, rather than providing the final answer after a fixed number of steps. Section 10 shows how to bound the error in a computed solution to Ax=b, for any of the methods discussed in this chapter. Section 11 discusses how to use netlib to retrieve existing library software electronically. Finally, section 12 gives a quick references guide to the BLAS.
This is a necessarily partial survey of a large and active field. For further
reading in numerical linear algebra, see the textbooks by Golub and Van Loan
, Watkins  or Demmel .
For more details on parallel numerical linear algebra in particular, see
[3,8,9], the last of which includes a bibliography
of over 2000 references.