SD is one of the oldest and simplest methods. It is actually more
important as a theoretical, rather than practical, reference by which
to test other methods. However, `steepest descent' *steps* are
often incorporated into other methods (e.g., Conjugate
Gradient, Newton) when roundoff destroys some
desirable theoretical properties, progress is slow, or
regions of indefinite curvature are encountered.

At each iteration of SD, the search direction is taken as ,
the negative gradient of the objective function at .
Recall that a descent direction satisfies .
The simplest way to guarantee the negativity of this inner product
is to choose .
This choice also
minimizes the inner product for unit-length vectors
and, thus gives rise to the name *Steepest Descent*.

SD is simple to implement and requires modest storage,
.
However, progress toward a minimum may be very slow, especially
near a solution. The convergence rate of SD when applied to a convex quadratic
function
is only *linear*. The associated convergence ratio
is no greater than
where
Since the convergence ratio measures the reduction of the error at every
step for a linear rate), the relevant SD value can be arbitrarily close
to 1 when is large. Thus, the SD search vectors may
in some cases exhibit very inefficient paths toward a solution,
especially close to the solution.

Minimization performance for Rosenbrock's and Beale's function with are shown in Figures 9 and Figure 13 for SD and other methods that will be discussed in this section.

Figure 9 Steepest Descent Minimization Path for the Two-Dimensional Rosenbrock Function. View Figure

Figure 10 Conjugate Gradient Minimization Path for the Two-Dimensional Rosenbrock Function. View Figure

Figure 11 BFGS Quasi-Newton Minimization Path for the Two-Dimensional Rosenbrock Function. View Figure

Figure 12 Truncated Newton Minimization Path for the Two-Demensional Rosenbrock Function. View Figure

Figure 13 Steepest Descent Minimization Path for the Two-Dimensional Beale Function. View Figure

Figure 14 Conjugate Gradient Minimization Path for the Two-Dimensional Beale Function. View Figure

Figure 15 BFGS Quasi-Newton Minimizatin Path for the Two-Dimensional Beale Function. View Figure

Figure 16 Truncated Newton Minimization Path for the Two-Dimensional Beale Function. View Figure

We show progress by superimposing the resulting iterates from each minimization application on the contour plot. Recall that the minimum point for Rosenbrock's function lies at , where . We clearly see in Figure 9 the characteristic behavior of the SD method: relatively good progress at first, but very slow convergence in the vicinity of the solution. The method was stopped after 1200 iterations, where a gradient norm of only was obtained. For the remaining methods, gradient norms of - were realized.

Note from the contour plots of Beale's function, Figure 13, that the function has a narrow curving valley in the vicinity of the minimum, which occurs at . The function value at the minimum is zero and increases sharply, particularly as increases.

From Figure 13, we clearly note how the SD search vectors (the negative gradient) are perpendicular to the contour lines; progress is initially rapid but then becomes very slow.