## 9.2 Parallelism and Data Locality in Preconditioned CG

To use CG to solve Ax=b, A must be symmetric and positive definite. In other short recurrence methods, other properties of A may be required or desirable, but we will not exploit these properties explicitly here.

Most often, CG is used in combination with some kind of preconditioning [79,4,97]. This means that the matrix A is implicitly multiplied by an approximation of . Usually, K is constructed to be an approximation of A, and so that Ky=z is easier to solve than Ax=b. Unfortunately, a popular class of preconditioners, those based on incomplete factorizations of A, are hard to parallelize. We have discussed some efforts to obtain more parallelism in the preconditioner in Section 9.1. Here we will assume the preconditioner is chosen such that the time to solve Ky=z in parallel is comparable with the time to compute Ap. For CG we also require that the preconditioner K be symmetric positive definite. We exploit this to implement the preconditioner more efficiently.

The preconditioned CG algorithm is as follows.

If A or K is not very sparse, most work is done in multiplying or solving , and this is where parallelism is most beneficial. It is also completely dependent on the structures of A and K.