We begin with a brief review of Gaussian elimination, Then we show how to reorganize it for shared memory parallel machines (section 6.1) and distributed memory parallel machines (section 6.2).
To solve Ax=b, we first use
Gaussian elimination to factor the nonsingular matrix A as PA=LU,
where L is lower
triangular, U is upper triangular, and P is a permutation matrix,
i.e. PA is the same as A but with its rows in a permuted order.
Then we solve
the triangular systems Ly=Pb and Ux=y for the solution x.
These last two operations are already Level 2 BLAS, so
we concentrate on factoring PA=LU, which has the dominant number
of floating point operations, .
Reordering the rows of A with P is called pivoting, and is
necessary for numerical stability (more on this in section 10).
We use the standard partial pivoting scheme [4]:
this means
L has ones on its diagonal and other entries bounded in absolute value by one.
The simplest version of Gaussian elimination algorithm involves
adding multiples of one row of A to others to zero out subdiagonal entries,
and overwriting A with L and U: