We begin with a brief review of Gaussian elimination, Then we show how to reorganize it for shared memory parallel machines (section 6.1) and distributed memory parallel machines (section 6.2).

To solve **Ax=b**, we first use
Gaussian elimination to factor the nonsingular matrix **A** as **PA=LU**,
where **L** is lower
triangular, **U** is upper triangular, and **P** is a permutation matrix,
i.e. **PA** is the same as **A** but with its rows in a permuted order.
Then we solve
the triangular systems **Ly=Pb** and **Ux=y** for the solution **x**.
These last two operations are already Level 2 BLAS, so
we concentrate on factoring **PA=LU**, which has the dominant number
of floating point operations, .
Reordering the rows of **A** with **P** is called * pivoting*, and is
necessary for numerical stability (more on this in section 10).
We use the standard * partial pivoting* scheme [4]:
this means
**L** has ones on its diagonal and other entries bounded in absolute value by one.
The simplest version of Gaussian elimination algorithm involves
adding multiples of one row of **A** to others to zero out subdiagonal entries,
and overwriting **A** with **L** and **U**: