SD is one of the oldest and simplest methods. It is actually more
important as a theoretical, rather than practical, reference by which
to test other methods. However, `steepest descent' * steps* are
often incorporated into other methods (e.g., Conjugate
Gradient, Newton) when roundoff destroys some
desirable theoretical properties, progress is slow, or
regions of indefinite curvature are encountered.

At each iteration of SD, the search direction is taken as ,
the negative gradient of the objective function at .
Recall that a descent direction satisfies .
The simplest way to guarantee the negativity of this inner product
is to choose .
This choice also
minimizes the inner product for unit-length vectors
and, thus gives rise to the name * Steepest Descent*.