2.4.1 Derivative Programming



next up previous
Next: 2.4.2 Steepest Descent Up: 2.4 Gradient Methods Previous: 2.4 Gradient Methods

2.4.1 Derivative Programming

 

Before we continue to describe algorithmic details of gradient methods, some programming tips are appropriate. For using gradient and second derivative minimization methods, derivative subroutines must be supplied by the user. Errors in the code for the function and the derivatives are common. Such errors, especially in the derivatives, are often difficult to find since function decrease may still be realized upon minimization. Therefore, it is essential to test the derivative routines of the objective function before applying minimization routines.

We have developed one such testing procedure using a Taylor series approach. Our general subroutine TESTGH tests the compatibility of the gradient and Hessian components of a function against the function routine. Its calling sequence is:

TESTGH(N,XC,FC,GC,Y,YHY,VEC)

where N is the dimension; XC(N) the current vector of control variables; FC the function value at XC; GC(N) the gradient vector at XC; Y(N) a random perturbation vector; YHY the inner product of Y with HY (the product of the Hessian evaluated at XC and the vector Y); and VEC(N) a work vector. On input, all quantities except for VEC must be given. The vector Y should be chosen so that the function value at XC+Y is in a reasonable range for the problem (see below).

Derivatives are tested using a Taylor expansion of around a given point . The following Taylor series is formulated at where is a scalar:

 

where and are the gradient and Hessian, respectively, evaluated at . If only the gradient routines are tested, the second-order Taylor term YHY is set to zero, and the truncation error is . Our test is performed by computing this Taylor approximation at smaller and smaller values of and checking whether the truncation errors are as expected: and if the approximation is correct up to the gradient and Hessian terms, respectively. At every step we half and test if indeed the truncation errors decrease as they should (i.e., if the error corresponding to is , the error for should be if the gradient is correct, and if the Hessian is also correct.)

The output consists of a series of values for RATIO (ratio of old to new errors) printed for each until the truncation error and/or is very small. If RATIO tends to 4 as is decreased (and the error is relatively small) the gradient is correct, and if RATIO tends to 8 both the gradient and Hessian are correct. If RATIO tends to 2, which is , neither the gradient nor the Hessian are correct. If RATIO tends to unity, the errors may be too large given the perturbation vector . Thus in general, reliable values of RATIO should occur when: (1) is not too large and not too small, and (2) the difference between and the Taylor-series approximation is of reasonable magnitude. Different starting point and/or perturbation vectors can be tried for verification. The code for TESTGH can be found in file testgh.f in connection with the online version of this paper. To view this file, click on testgh.f.

For example, output from testing Rosenbrock's function for 12 variables consists of the following:

    X0 VECTOR:
         -1.20            1.00           -1.20            1.00
         -1.20            1.00           -1.20            1.00
         -1.20            1.00           -1.20            1.00
    Y VECTOR:
         -1.09            0.77           -0.88            0.64
          0.71            0.58            0.94           -0.90
         -0.62            0.77           -0.90           -0.98

         ENTERING TESTGH ROUTINE:

    THE FUNCTION VALUE AT X               =   1.45200000E+02
    THE FIRST-ORDER TAYLOR TERM,  (G, Y)  =   3.19353760E+02
    THE SECOND-ORDER TAYLOR TERM, (Y,HY)  =   5.39772665E+03
    EPSMIN =   1.42108547E-14


    EPS           F              TAYLOR          DIFF.         RATIO

 5.0000E-01  1.09854129E+03  9.79592712E+02  1.18948574E+02
 2.5000E-01  4.07080835E+02  3.93717398E+02  1.33634374E+01  8.90104621E+00
 1.2500E-01  2.28865318E+02  2.27288959E+02  1.57635878E+00  8.47740855E+00
 6.2500E-02  1.75893210E+02  1.75702045E+02  1.91165417E-01  8.24604580E+00
 3.1250E-02  1.57838942E+02  1.57815414E+02  2.35282126E-02  8.12494428E+00
 1.5625E-02  1.50851723E+02  1.50848805E+02  2.91806005E-03  8.06296382E+00
 7.8125E-03  1.47860040E+02  1.47859677E+02  3.63322099E-04  8.03160629E+00
 3.9063E-03  1.46488702E+02  1.46488657E+02  4.53255493E-05  8.01583443E+00
 1.9531E-03  1.45834039E+02  1.45834033E+02  5.66008660E-06  8.00792506E+00
 9.7656E-04  1.45514443E+02  1.45514443E+02  7.07160371E-07  8.00396463E+00
 4.8828E-04  1.45356578E+02  1.45356578E+02  8.83731524E-08  8.00198196E+00

    DIFF IS SMALL (LESS THAN   2.97291798E-08 IN ABSOLUTE VALUE)

Note that the RATIO is larger than eight when EPS is larger and then decreases steadily. A small error in the code would produce much different values. We encourage the student to try this testing routine on several subroutines that compute objective functions and their derivatives; errors should be introduced into the derivative codes systematically to examine the ability of TESTGH to detect them and provide the right diagnosis, as outlined above.

(See exercises 5 and 6.)



next up previous
Next: 2.4.2 Steepest Descent Up: 2.4 Gradient Methods Previous: 2.4 Gradient Methods



verena@csep1.phy.ornl.gov