The obvious way to extract more parallelism and
data locality is to generate a basis
, , ..., for the Krylov subspace first,
and to orthogonalize this set afterwards; this is called
**m**-step GMRES(**m**) [104]. This approach does not
increase the computational work and, in contrast to CG, the numerical
instability due to generating a possibly near-dependent set is not
necessarily a drawback. One reason is that error cannot build up as in
CG, because the method is restarted every **m** steps.
In any case, the resulting set, after orthogonalization,
is the basis of some subspace, and the residual is then minimized over that
subspace. If, however, one wants to mimic
standard GMRES(**m**) as closely as possible,
one could generate a better (more independent) starting set of basis
vectors , , ...,
, where the are suitable degree **j**
polynomials.
Newton polynomials are suggested in [111] and
and Chebychev polynomials in [80].

After generating a suitable starting set, we still have to
orthogonalize it. In [80]
modified Gram--Schmidt is used while avoiding communication times
that cannot be overlapped. We outline this approach, since it
may be of value for other orthogonalization methods.
Given a basis for the Krylov subspace, we orthogonalize by

for :}

/* orthogonalize w.r.t. */

for

In order to overlap the communication costs of the inner products,
we split the **j**-loop into two parts. Then for each **k** we proceed as
follows.