Chronopoulos and Gear [98] propose to improve further the
data locality and parallelism in CG by combining **s** successive steps.
Their algorithm is based upon the following property of CG. The residual
vectors form an orthogonal basis (assuming exact arithmetic)
for the Krylov subspace spanned by . Given
through , the vectors , also
span this subspace. Chronopoulos and Gear propose to combine
**s** successive steps by generating first, and
then
to orthogonalize and update the current solution with
this blockwise extended subspace. Their approach leads to slightly more
flops than **s** successive steps of standard CG, and also
one additional matrix--vector product every **s** steps.
The implementation issues for vector register computers and
distributed memory machines are discussed in great detail in
[102].

The main drawback in this approach is potential numerical
instability: depending on the spectrum of **A**, the set
may converge to a vector in the direction
of a dominant eigenvector, or in other words, may become dependent for
large values of **s**. The authors claim success in using
this approach without serious stability problems for small values of **s**.
Nevertheless, it seems that **s**-step CG still has a bad reputation
[103]
because of these problems. However,
a similar approach, suggested by Chronopoulos and Kim [104]
for other processes such as GMRES, seems to be more promising. Several
authors have pursued this direction, and we will come back to this in
Section 9.3.