We consider another variant of CG, in which we may
overlap all communication time with useful computations.
This is just a reorganized version of the original CG scheme,
and is therefore precisely as stable. The key trick
is to delay the updating of the solution vector.
Another advantage
over the previous scheme is that no additional operations are required.
We will assume that the preconditioner **K** can be written as .
Furthermore, **L** has a block structure,
corresponding to the grid blocks, so that any
communication can again be overlapped with computation.