Extensions of the linear CG method to nonquadratic problems have been developed and extensively researched [61][52][23]. In the common variants, the basic idea is to avoid matrix operations altogether and simply express the search directions recursively as

for , with . The new iterates for the minimum point can then be set to

where is the step length. In comparing this iterative procedure to the linear CG of Algorithm 2.2, we note that for a quadratic function. The parameter above is chosen so that if were a convex quadratic and is the exact one-dimensional minimizer of along , the nonlinear CG reduces to the linear CG method and terminates in at most steps in exact arithmetic.

Three of the best known formulas for are titled Fletcher-Reeves (FR), Polak-Ribière (PR), and Hestenes-Stiefel (HS) after their developers. They are given by the following formulas:

Interestingly, the last two formulas are generally preferred in practice, though the first has better theoretical global convergence properties. In fact, very recent research has focused on combining these practical and theoretical properties for construction of more efficient schemes. The simple modification of

for example, can be used to prove global convergence of this nonlinear CG method, even with inexact line searches [26]. A more general condition on , including relaxation of its nonnegativity, has also been derived.

The quality of line search in these nonlinear CG algorithms is crucial
to preserve the mutual conjugacy property of the search directions
and to ensure that each generated direction is one
of descent. A technique known as *restarting* is typically used
to preserve a linear convergence rate by resetting to the
steepest descent direction, for example, after a given number of linear
searches (e.g., ). Preconditioning may also
be used as in the linear case to accelerate
convergence.

In sum, the greatest virtues of CG methods are their modest storage and computational requirements (both order ), combined with much better convergence than SD. These properties have made them popular linear-solvers and minimization choices in many applications, perhaps the only candidates for very large problems. The linear CG is often applied to systems arising from discretizations of partial differential equations where the matrices are frequently positive-definite, sparse, and structured [25][20].