next up previous

9.2 Parallelism and Data Locality in Preconditioned CG     continued...

Chronopoulos and Gear [98] propose to improve further the data locality and parallelism in CG by combining s successive steps. Their algorithm is based upon the following property of CG. The residual vectors form an orthogonal basis (assuming exact arithmetic) for the Krylov subspace spanned by . Given through , the vectors , also span this subspace. Chronopoulos and Gear propose to combine s successive steps by generating first, and then to orthogonalize and update the current solution with this blockwise extended subspace. Their approach leads to slightly more flops than s successive steps of standard CG, and also one additional matrix--vector product every s steps. The implementation issues for vector register computers and distributed memory machines are discussed in great detail in [102].

The main drawback in this approach is potential numerical instability: depending on the spectrum of A, the set may converge to a vector in the direction of a dominant eigenvector, or in other words, may become dependent for large values of s. The authors claim success in using this approach without serious stability problems for small values of s. Nevertheless, it seems that s-step CG still has a bad reputation [103] because of these problems. However, a similar approach, suggested by Chronopoulos and Kim [104] for other processes such as GMRES, seems to be more promising. Several authors have pursued this direction, and we will come back to this in Section 9.3.