We begin with a brief review of Gaussian elimination, Then we show how to reorganize it for shared memory parallel machines (section 6.1) and distributed memory parallel machines (section 6.2).
To solve Ax=b, we first use Gaussian elimination to factor the nonsingular matrix A as PA=LU, where L is lower triangular, U is upper triangular, and P is a permutation matrix, i.e. PA is the same as A but with its rows in a permuted order. Then we solve the triangular systems Ly=Pb and Ux=y for the solution x. These last two operations are already Level 2 BLAS, so we concentrate on factoring PA=LU, which has the dominant number of floating point operations, . Reordering the rows of A with P is called pivoting, and is necessary for numerical stability (more on this in section 10). We use the standard partial pivoting scheme : this means L has ones on its diagonal and other entries bounded in absolute value by one. The simplest version of Gaussian elimination algorithm involves adding multiples of one row of A to others to zero out subdiagonal entries, and overwriting A with L and U: