SD is one of the oldest and simplest methods. It is actually more important as a theoretical, rather than practical, reference by which to test other methods. However, `steepest descent' steps are often incorporated into other methods (e.g., Conjugate Gradient, Newton) when roundoff destroys some desirable theoretical properties, progress is slow, or regions of indefinite curvature are encountered.
At each iteration of SD, the search direction is taken as , the negative gradient of the objective function at . Recall that a descent direction satisfies . The simplest way to guarantee the negativity of this inner product is to choose . This choice also minimizes the inner product for unit-length vectors and, thus gives rise to the name Steepest Descent.
SD is simple to implement and requires modest storage, . However, progress toward a minimum may be very slow, especially near a solution. The convergence rate of SD when applied to a convex quadratic function is only linear. The associated convergence ratio is no greater than where Since the convergence ratio measures the reduction of the error at every step for a linear rate), the relevant SD value can be arbitrarily close to 1 when is large. Thus, the SD search vectors may in some cases exhibit very inefficient paths toward a solution, especially close to the solution.
Minimization performance for Rosenbrock's and Beale's function with are shown in Figures 9 and Figure 13 for SD and other methods that will be discussed in this section.
Figure 10 Conjugate Gradient Minimization Path for the Two-Dimensional Rosenbrock Function. View Figure
Figure 11 BFGS Quasi-Newton Minimization Path for the Two-Dimensional Rosenbrock Function. View Figure
Figure 12 Truncated Newton Minimization Path for the Two-Demensional Rosenbrock Function. View Figure
Figure 13 Steepest Descent Minimization Path for the Two-Dimensional Beale Function. View Figure
Figure 14 Conjugate Gradient Minimization Path for the Two-Dimensional Beale Function. View Figure
Figure 15 BFGS Quasi-Newton Minimizatin Path for the Two-Dimensional Beale Function. View Figure
Figure 16 Truncated Newton Minimization Path for the Two-Dimensional Beale Function. View Figure
We show progress by superimposing the resulting iterates from each minimization application on the contour plot. Recall that the minimum point for Rosenbrock's function lies at , where . We clearly see in Figure 9 the characteristic behavior of the SD method: relatively good progress at first, but very slow convergence in the vicinity of the solution. The method was stopped after 1200 iterations, where a gradient norm of only was obtained. For the remaining methods, gradient norms of - were realized.
Note from the contour plots of Beale's function, Figure 13, that the function has a narrow curving valley in the vicinity of the minimum, which occurs at . The function value at the minimum is zero and increases sharply, particularly as increases.
From Figure 13, we clearly note how the SD search vectors (the negative gradient) are perpendicular to the contour lines; progress is initially rapid but then becomes very slow.