Newton methods include several classes, including *discrete* Newton,
*quasi* Newton (QN) (also termed *variable metric*), and
*truncated* Newton (TN). Historically, the
memory
requirements and
computation associated with solving a linear system
directly have restricted Newton methods only: (1) to small problems,
(2) to problems with special sparsity patterns, or (3) near a solution,
after a gradient method has been applied. Fortunately, advances in
computing technology and software are making the Newton approach
feasible for a wide range of problems. Particularly,
with advances in automatic differentiation [53][35],
the appeal of these methods
should increase further.
Extensive treatments of Newton methods can be found in
the literature ([45][30][23][16], for example)
and only general concepts will be outlined here.

For large-scale applications,
essentially two specific classes are emerging as the
most powerful techniques:
*limited-memory*
quasi-Newton (LMQN) and truncated Newton methods.
LMQN methods attempt to combine the modest storage and computational
requirements of CG methods with the superlinear convergence properties of
standard (i.e., full memory) QN methods. Similarly,
TN algorithms attempt to retain
the rapid quadratic convergence rate of classic Newton methods while
making computational requirements feasible for large-scale functions.

All Newton methods are based on approximating the objective function locally by a quadratic model and then minimizing that function approximately. The quadratic model of the objective function at along is given by the expansion

The minimum of the right-hand side is achieved when is the minimum of the quadratic function:

Alternatively, such a *Newton direction*
satisfies the linear system of
simultaneous equations, known as the *Newton equation*:

In the ``classic'' Newton method, the Newton direction is used to update each previous iterate by the formula , until convergence. The reader may recognize the one-dimensional version of Newton's method for solving a nonlinear equation : . The analogous iteration process for minimizing is: . Note that the one-dimensional search vector is replaced by the Newton direction in the multivariate case. This direction is defined for nonsingular but its solution may be unstable. When is sufficiently close to a solution , quadratic convergence can be proven for Newton's method [45][23][16]. In practice, this means that the number of digits of accuracy in the solution is approximately doubled at every step! This rapid convergence can be seen from the program output for a simple one-dimensional application of Newton's method to finding the root of (equivalently, solving or minimizing ) (see Table 3). See the linear algebra chapter for related details. Note in the double precision version the round-off in the last steps.

Table 3 Newton's Iteration for Solving . View Table

In practice, modifications of the classic Newton iteration are essential
for guaranteeing global convergence, with quadratic convergence
rate near the solution.
First, when is not positive-definite, the search direction may not
exist or may not be a descent direction. Strategies to produce a related
positive-definite matrix , or alternative search directions,
become necessary. Second, far away from , the quadratic
approximation of (43) may be poor, and the Newton direction must
be adjusted. A line search, for example, can *dampen* (scale)
the Newton direction when it exists, ensuring sufficient decrease
and guaranteeing uniform progress towards a solution. These modifications
lead to the following *modified* Newton framework using a line
search:

Newton variants are constructed by combining various strategies for the
individual components above. These involve procedures for formulating
or , dealing with structures of
indefinite Hessians,
and solving for the *modified Newton* search direction. For example,
when is approximated by *finite differences*, the discrete Newton
subclass emerges [16]. When , or its
inverse, is approximated by some modification of the previously
constructed matrix (see below), QN methods are formed [17][16].
When
is nonzero, TN methods result
[57][56][55][47][18]
since the solution of the Newton
system is *truncated* before completion.