2.4.5 Nonlinear Conjugate Gradient



next up previous
Next: 2.5 Newton Methods Up: 2.4 Gradient Methods Previous: 2.4.4 Preconditioning

2.4.5 Nonlinear Conjugate Gradient

 

Extensions of the linear CG method to nonquadratic problems have been developed and extensively researched [61][52][23]. In the common variants, the basic idea is to avoid matrix operations altogether and simply express the search directions recursively as

 

for , with . The new iterates for the minimum point can then be set to

 

where is the step length. In comparing this iterative procedure to the linear CG of Algorithm 2.2, we note that for a quadratic function. The parameter above is chosen so that if were a convex quadratic and is the exact one-dimensional minimizer of along , the nonlinear CG reduces to the linear CG method and terminates in at most steps in exact arithmetic.

Three of the best known formulas for are titled Fletcher-Reeves (FR), Polak-Ribière (PR), and Hestenes-Stiefel (HS) after their developers. They are given by the following formulas:

 

 

 

Interestingly, the last two formulas are generally preferred in practice, though the first has better theoretical global convergence properties. In fact, very recent research has focused on combining these practical and theoretical properties for construction of more efficient schemes. The simple modification of

 

for example, can be used to prove global convergence of this nonlinear CG method, even with inexact line searches [26]. A more general condition on , including relaxation of its nonnegativity, has also been derived.

The quality of line search in these nonlinear CG algorithms is crucial to preserve the mutual conjugacy property of the search directions and to ensure that each generated direction is one of descent. A technique known as restarting is typically used to preserve a linear convergence rate by resetting to the steepest descent direction, for example, after a given number of linear searches (e.g., ). Preconditioning may also be used as in the linear case to accelerate convergence.

In sum, the greatest virtues of CG methods are their modest storage and computational requirements (both order ), combined with much better convergence than SD. These properties have made them popular linear-solvers and minimization choices in many applications, perhaps the only candidates for very large problems. The linear CG is often applied to systems arising from discretizations of partial differential equations where the matrices are frequently positive-definite, sparse, and structured [25][20].



next up previous
Next: 2.5 Newton Methods Up: 2.4 Gradient Methods Previous: 2.4.4 Preconditioning



verena@csep1.phy.ornl.gov