next up previous

9.3 Parallelism and Data Locality in GMRES     continued...

The obvious way to extract more parallelism and data locality is to generate a basis , , ..., for the Krylov subspace first, and to orthogonalize this set afterwards; this is called m-step GMRES(m) [104]. This approach does not increase the computational work and, in contrast to CG, the numerical instability due to generating a possibly near-dependent set is not necessarily a drawback. One reason is that error cannot build up as in CG, because the method is restarted every m steps. In any case, the resulting set, after orthogonalization, is the basis of some subspace, and the residual is then minimized over that subspace. If, however, one wants to mimic standard GMRES(m) as closely as possible, one could generate a better (more independent) starting set of basis vectors , , ..., , where the are suitable degree j polynomials. Newton polynomials are suggested in [111] and and Chebychev polynomials in [80].

After generating a suitable starting set, we still have to orthogonalize it. In [80] modified Gram--Schmidt is used while avoiding communication times that cannot be overlapped. We outline this approach, since it may be of value for other orthogonalization methods. Given a basis for the Krylov subspace, we orthogonalize by

for :}
/* orthogonalize w.r.t. */

In order to overlap the communication costs of the inner products, we split the j-loop into two parts. Then for each k we proceed as follows.