The CG method was originally designed to minimize convex quadratic functions but, through several variations, has been extended to the general case [52][23]. The first iteration in CG is the same as in SD, but successive directions are constructed so that they form a set of mutually conjugate vectors with respect to the (positive-definite) Hessian of a general convex quadratic function . Whereas the rate of convergence for SD depends on the ratio of the extremal eigenvalues of , the convergence properties of CG depend on the entire matrix spectrum. Faster convergence is expected when the eigenvalues are clustered. In exact arithmetic, convergence is obtained in at most steps. In particular, if has distinct eigenvalues, convergence to a solution requires iterations.

For example, when the bound on convergence measures the size of with respect to the -norm,

Clearly rapid convergence is expected when , as for SD. Further estimates of convergence bounds can only be derived when certain properties about the eigenvalue distribution are known (e.g., large eigenvalues and small eigenvalues clustered in a region ) [32].

When one refers to the CG method, one often means the *linear*
Conjugate Gradient; that is, the implementation for the convex quadratic
form. In this case, minimizing
is equivalent to solving the *linear* system .
Consequently, the conjugate directions , as well as the step
lengths , can be computed in closed form. Below we sketch such
an algorithm from a given . We define the residual vector and use the vectors below
to denote the CG search vectors.

Note here that only a few vectors are stored, the product
is required but not knowledge (or storage) of *per se*, and the
cost only involves several scalar and vector operations. The value of
the step length can be derived in the closed form above
by minimizing the quadratic function
as a function of (producing
) and
then using
the conjugacy relation: for all [45].

See the linear algebra chapter for further details and examples.

Minimization performance of CG is shown in Figures 10 and 14 for Rosenbrock's and Beale's functions, respectively. Note that performance for both functions with CG is better than SD, as expected, but the paths are characteristically more `random'. For Rosenbrock's function, for example, a large step is taken between the fourth and fifth iterates, where a distant minimum was detected in the line search. The step length varies considerably from one iteration to the next. Similarly, for Beale's function, the step lengths vary from to 800.