Newton methods include several classes, including discrete Newton, quasi Newton (QN) (also termed variable metric), and truncated Newton (TN). Historically, the memory requirements and computation associated with solving a linear system directly have restricted Newton methods only: (1) to small problems, (2) to problems with special sparsity patterns, or (3) near a solution, after a gradient method has been applied. Fortunately, advances in computing technology and software are making the Newton approach feasible for a wide range of problems. Particularly, with advances in automatic differentiation , the appeal of these methods should increase further. Extensive treatments of Newton methods can be found in the literature (, for example) and only general concepts will be outlined here.
For large-scale applications, essentially two specific classes are emerging as the most powerful techniques: limited-memory quasi-Newton (LMQN) and truncated Newton methods. LMQN methods attempt to combine the modest storage and computational requirements of CG methods with the superlinear convergence properties of standard (i.e., full memory) QN methods. Similarly, TN algorithms attempt to retain the rapid quadratic convergence rate of classic Newton methods while making computational requirements feasible for large-scale functions.
All Newton methods are based on approximating the objective function locally by a quadratic model and then minimizing that function approximately. The quadratic model of the objective function at along is given by the expansion
The minimum of the right-hand side is achieved when is the minimum of the quadratic function:
Alternatively, such a Newton direction satisfies the linear system of simultaneous equations, known as the Newton equation:
In the ``classic'' Newton method, the Newton direction is used to update each previous iterate by the formula , until convergence. The reader may recognize the one-dimensional version of Newton's method for solving a nonlinear equation : . The analogous iteration process for minimizing is: . Note that the one-dimensional search vector is replaced by the Newton direction in the multivariate case. This direction is defined for nonsingular but its solution may be unstable. When is sufficiently close to a solution , quadratic convergence can be proven for Newton's method . In practice, this means that the number of digits of accuracy in the solution is approximately doubled at every step! This rapid convergence can be seen from the program output for a simple one-dimensional application of Newton's method to finding the root of (equivalently, solving or minimizing ) (see Table 3). See the linear algebra chapter for related details. Note in the double precision version the round-off in the last steps.
(See exercise 7 and 8.)
In practice, modifications of the classic Newton iteration are essential for guaranteeing global convergence, with quadratic convergence rate near the solution. First, when is not positive-definite, the search direction may not exist or may not be a descent direction. Strategies to produce a related positive-definite matrix , or alternative search directions, become necessary. Second, far away from , the quadratic approximation of (43) may be poor, and the Newton direction must be adjusted. A line search, for example, can dampen (scale) the Newton direction when it exists, ensuring sufficient decrease and guaranteeing uniform progress towards a solution. These modifications lead to the following modified Newton framework using a line search:
Newton variants are constructed by combining various strategies for the individual components above. These involve procedures for formulating or , dealing with structures of indefinite Hessians, and solving for the modified Newton search direction. For example, when is approximated by finite differences, the discrete Newton subclass emerges . When , or its inverse, is approximated by some modification of the previously constructed matrix (see below), QN methods are formed . When is nonzero, TN methods result  since the solution of the Newton system is truncated before completion.