CHAPTER UNCONSTRAINED M ULTIVARIABLE OPTIMIZATION

(1)

C H A P T E R

UNCONSTRAINED M ULTIVARIABLE OPTIMIZATION

6 . 1 6.2 6..1 6.4 6.5 6.6 6.7 6.tl

l)irect Methods 190

202 208 219 230 233 236 239 239 24r lndirect Methods-First Order .

lndirect Methods-Second Order Sccant Methods

liinite Difference Approximations as substitutes for Derivatives . Soutces of Computer Codes for Unconstrained Optimization Dvaluation of Codes for Unconstrained Optimization l)iagnosis of OPtimization

Relcrences

Code Failure to Solve a Problem

illt

l)roblems

r M r z A u ( )N l t 9 T h c n u t r t c r i c a l o p t i m i z a t i o n o f g c n c r a l n o n l i n c a r m u l t i v a r i a b l c o b . j c c t i v c functions roquircs that cllicicnt and robust tcchniques be employcd. lillicicncy is important sincc thcse problems require an interative solution proccdurc, 'l'rill and crror bccomcs impractical for more than three or four variablcs. Robustncss (ability to achicve a solution) is a desirable property since a gcncral nonlincar function is unpredictable in its behavior; there may be relativc maxima or minima, saddle points, regions of convexityfconcavity, etc. In somc rcgions thc optimization algorithm may yield very slow progress toward thc optimum, hence requiring excessive computer time. Fortunately, we can draw upon cxtcn- sive numerical experience in testing nonlinear programming algorithms for un- constrained functions to evaluate various approaches proposed for the solutiun of such functions.

In this chapter we discuss the solution of the unconstrained optimization problem:

Find x* : [xr x2 x,f' that minimizes .f (xr, xr,. . .,x,) = ./'(x) Most iterative procedures that are effective alternate between two phases in thc optimization:

(a) choosing a search direction sft

(b) minimizing in that direction to some extent (or completely) to find a now point xft+1 : xft * Axt where Axft = x&*1 - xk is known as the step sizc. For maximization, we minimize -.f(x).

In addition to (a) and (b), an algorithm must specify (c) the initial starting vector xo : ["? xl xf]r and (d) the convergence criteria for termination

From a given starting point, a search direction is determined, and ./'(x) is minimized in that direction. The search stops based on some criteria, and thcn u new search direction is determined, followed by another line search. Thc linc search can be carried out to various degrees of precision. For example, wc could use a simple successive doubling of the step size as a screening method until we detect the optimum has been bounded. At this point the screening search can bc terminated and a more sophisticated method employed to yield a higher dcgrcc of accuracy. In any event, refer to the techniques discussed in Chap. 5 lor ways to carry out the line search.

The NLP (nonlinear programming) methods which will be discusscd in

this chapter differ mainly in how they generate the search directions. Somc non-

l i n c a r p r o g r a m m i n g m c t h o d s r e q u i r e i n f o r m a t i o n a b o u t d o r i v a t i v c v a l u c s ,

w h c r c a s o t h o r m c t h o d s d o n c l t u s c d c r i v a t i v c s a n d r c l y s o l c l y o n f u n c t i o n c v l l -

u a t i o n s . F u r t h c r m o r o , li n i t c d i l l ' c r c n c c s u b s t i t u t c s c a n b c u s c d i n l i c u o f c l c r i v a -

t i v c s a s c x p l a i n c d i n S o c . 6 . 5 . In m o s t c i l s c s , m c t h o d s t h a t c r n p l o y a n a l y t i c a l

d c r i v a t i v c s u s c l c s s c ( ) m p u l c r ti m c b u t n r r l r c o l ' y r l u r t i m c l i l r l n l l y s i s . W c s h a l l

d i s c r r s s b o l h . W c l i r s t d c s c r i h c s o l n c s i n r p l c r r o r r t l c r i v i r l i v c r r r c l h o r l s . i r n t l l h c n

(2)

44#j

l 9 l l l N r o N r l r a A r N r ' n M l r i l v A u A n l r ' o l ' r M r / A n o N

p t ' c s e t l l l $ e r i c s o l ' n t c l l t o d s w h i c h usc dcrivutivc inlbrmution, We ulso show how lhc nitturc of thc objoctivc function inllucnccs thc elTcctivcncss of thc pnrticulur o p l i m i z l t i o n u l g o r i t h m .

6.I DIRECT METHODS

l)ircct mcthods do not require the use of derivatives in determining thc search dircction. Undcr somc circumstances the methods described in this scction can bc usccl cflcctively but they may be inefficient compared with methods discussed irt subscqucnt scctions. They do have the advantage of being simple to under- s l n n d l n d e x e c u t e .

6.1.1 Random Search

A nrrrdorrr scarch method simply selects a starting vector xo, evaluates /(x) at x", rrrrd thcn randomly selects another vector xr, and evaluates /(x) at xl. In r,ll'ct't, botlr lr scarch direction and step length are chosen simultaneously. After rln(. or nl()l'c stagcs, the value of /(xt) is compared with the best previous value rtl /(r)li'orrt lrntlng the previous stages, and the decision is made to continue or lcl'tttittrrlc tltc procedure. Variations of this form of random search involve ran- rlottrly sclccting a search direction and then minimizing (possibly by random rlclrs) irr tlrat scarch direction as a series of cycles. Clearly, the optimal solution urn bc ohtaincd with a probability of 1 only as k --+ co, but as a practical matter, il'llrc ob.jcctivc function is quite flat, a suboptimal solution may be quite accept- rtblc. livcn though the method is inefficient insofar as function evaluations are eonccrncd, it may provide a good starting point for another method. You might vicw rirndom scarch as an extension of the case study method. Refer to Dixon runtl .lirnrcs (1980) for some practical algorithms.

6,1,2 Grid Search

Mctltods of cxperimental design discussed in most basic statistics books can ctprllly wcll be applied to minimizing ,f(x) (see Box and Hunter, 1962). You r'vrrlrurtc a scries of points about a reference point selected according to some lypc ol'clcsign such as the ones shown in Fig.6. I (for an objective function of two vlriablcs). Ncxt you move to that point which improves the objective func- l i o n t h c m o s t , a n d r c p e a t . F o r n : 1 0 , we must examine 31o - I :59,048 values o l ' / ( x ) i l ' a t h r c c - l e v e l f a c t o r i a l d e s i g n is t o b e u s e d , o b v i o u s l y a prohibitive r r u r r r h c r o l ' f r r n c t i o n c v a l u a t i o n s .

6.1.3 tlnivariate Search

Attolltcr sintplc optimization tcchniquc would bc to sclect n lixcd scarch direc- l i o t t s ( t t s u i t l l y t l t o c o o r c l i n i t l c i l x c s ) fo r u n o b j c c t i v c fu n c t i o n ol'n variablcs, T h e n

(o) ThreeJcvel factorial d e s i g n ( 3 2 - l : 8 p o i n r s plus center)

n I r ) n i l r ( ' l Mt,'r'lr()t)s l9l

Figure 6.1 Various grid scaroh dosigns to sclcel v e c t o r s x t o c v a l u a t c . l ( x ) .

(b) Hexagon design (6 points + center)

(c) Two-level factorial

d e s i g n ( 2 2 = 4 p o i n t s

plus center)

(3)

1 9 2 t , N ( r ) N s r R A r N r l r ) M u r l r v A R r A n l, ( ) r , i l M r r A i l o N

,l'(x) is minimizcd in cach scarch direction scqucntially using a onc-tlimcnsionul lteorch. whilc this method is elTective lor a quadratic function tlf thc form

./i(x): ,t,r,*?

hccuusc thc search directions line up with the principal axes as indicated in I'ig.6.2u, it does not perform satisfactorily for more general quadratic objective lirnctions of the form

f r ( x ) : d , r x i x l

as ilf ustratcd in Fig. 6.2b. For the latter case, the step size decreases as the opti- rnum is neared, thus slowing down the search procedure.

6,1,4 Simplex Method

't'lrc lrrcthod <lf the "sequential simplex" formulated by Spendley, Hext, and llitrrsworth (1962) uses a regular geometric figure (a simplex) to select points at lltc vcrliccs of thc simplex at which to evaluate /(x). In two dimensions the llgrrrc is rrrr cquilatcral triangle. In three dimensions this figure becomes a regu- lar lctrrrlrcclron, and so on. Each search direction points away from the vertex huvirrg thc highcst value of /(x). Thus, the direction of search changes, but the ntcp sizc is lixed flor a given size simplex. Let us use a function of two variables lo illustral.c the procedure.

At cach iteration, to minimize

-f(x), -f(x) is evaluated at each of three ver- ticcs of thc triangle. The direction of search is oriented away from the point with thc highcst value for the function through the centroid of the simplex. By making the search direction bisect the line between the other two points of the triirnglc, the direction will go through the centroid. A new point is selected in

n n

t t L2 /)

( a ) /i(x) ( h ) /,(x)

l ' ' l S r l r e 6 . 2 l l x c t t t l i o t t o l tt ttttivtttltlc rcrrrclr on lwo rlillcrcttl (lllt(lftrltc lirrrr.lt6rr;.

4G

n I r ) r R r , ( ' r M r r r r r { ) r ) s 1 9 3

Figure 6.3 Reflection to a ncw point in thc Sinl- olex method.

this reflected direction (examine Fig. 6.3), preserving the geometric shapc. The objective function is then evaluated at the new point, and a new search dircction is calculated. The method proceeds rejecting one vertex at a time until the simplex straddles the optimum. Various rules are used to prevent cxcessive repeating of the same cycle or simplexes.

As the optimum is approached, the last equilateral triangle will strlddle the optimum point or be within a distance of the order of its own size from thc optimum (examine Fig.6.4). Therefore, the procedure cannot get closcr to tha

l , ' l g u r c 6 . 4 I t r o g r c s s i o r r l o t l r c v i c i r r i l y o l l h c o p l i r r r u r t t iu r t l o s c i l l u t i o t t it r r t u r t r l tl t c o l t l i l t t t t t l t t t r l t t p

t h c S i n r p l c x n r c t h o d r r l s c t r c h . ' l ' h c o r i g i r r a l v c t ' l i c e s r r r c x l l . x ' f , r r r t r l x l , ' l ' h c t l c x t p ( ) i t t l ( v c l l c r ) i r i r , 1 .

S u c c c e t l i n g r r c w v c r l i r : c s i r r c r r u r n h c r c t l s l u r l i n g w i t l r I t t t t r l c o t t l i t t t t i t r p l o L l r t l w l r i c h ;roi111 1 1 1 ' y 1 ' 1 6

s l l l r l N t o r c p c u l , ' l ' l t c s i z c o l t h c S i r r r p l e x il r c t l t t t ' t ' t l lo 7 , l , l , r t t u l 1 5 , t t t t t l t l t c l t l l t e l t l r e o t l t t r o h

c o n l i r r r r c t l ( ttol rltowtt ),

(4)

- 1 9 4 r r N ( , N \ r ( A r N t t ) r \ r r r r r \ ' A K r A r r I o r , t t i r / A i l l N

( ) l ) l i l l l t t l l l t l l l ( l w ( ) u l ( l t' c p c l t i t s c l l ' s o tllltl lltc nttttplcr htt(. nntsl lrr. r'crlrrr:ctl. s r r c l r i l s l l i l l v i t l g th c l c r t g t l t o l ' a l l t l r c s i d c s ol'tlrc sirrrplex r ( ) n l l n l l l l { l l r e v c l t e x w l t c r c l h c o s c i l l i r t i t t n s t a r t c t l . A n o w s i m p l c x conrposetl o l ' t h c r r r i t l p o i n l s o l ' t l r c o r r d i r r g s i r t r p l c x i s c t l n s t r u c t c d . W h c n t h e s i m p l e x s i z c is srnallcl thirn ir prcscribcd t o l c r - i r n c c , tl r e n lh c r < l u t i n c is s t o p p e d . T h u s , t h e o p t i m u m p o s i t i o n is dctormined to withirr a tolcrancc influenced by the size of the simplex.

Ncldcr and Mead (1965) described a more efficient (but more complex) vcrsion of thc simplex method that permitted the geometric figures to expand rrnd conlrirct conlinuously during the search. Their method minimized a func- lion of rr variables using (n * 1) vertices of a flexible polyhedron. Details of the rrrcthod togcthcr with a computer code to execute the algorithm can be found in l f i r n r n o l b l a u ( 1 9 7 2 \ .

6.1.5 Conjugate Search Directions

lrxpcricnce has shown that directions called conjugate directions are much more cll'cclivc as search directions than arbitrarily chosen search directions, such as in rrnivrrriate search, or even orthogonal search directions. Two directions siand si rrrc srricl to be conjuglate (or conjugated) with respect to each other if

(sr)"e(si) : o (6.1)

I t r g , c n c r a l , a s e t o f n l i n e a r l y in d e p e n d e n t d i r e c t i o n s o f s e a r c h S o , s l , . . . , s n - l a r e sirid to bc conjugate with respect to a positive definite square matrix Q if

1 s ' ; r q s ; : g o < i + i < n - 1 (6.2) Irr optimization the matrix Q is the Hessian matrix of the objective function, H.

lior r quadratic func:tktn /(x) of n variables, in which H is a constant matrix, you irro guaranteed to reach the minimum of /(x) in n stages if you minimize ('.\(r('fl-l'crn cach stage (Fletcher and Reeves, 1964). A conjugate direction in gen- crirl is rrrrt a uniquc direction. However, in two dimensions, if you choose an irritirrl clircction sr and Q, s2 is fully specified as illustrated in Example6.l below.

Orthogonality is a special case of conjugacy because when Q : I, (st)tsi : 0 in liq.(6.2). If thc coordinates of x are translated and rotated by suitable lrirrrslormations so as to align the new principal axes of H(x) with the eigenvec- lots of ll(x) and to place the center of the coordinate system at the stationary lrrrirrt of /(x) (rcfer to Figs.4.12 to 4. l6), then conjugacy can be interpreted as orlhogonulity in thc space of the transformed coordinates.

Althtlugh ltuthors and practitioners refer to a class of unconstrained opti- r r r i z r r l i o n n r c l h o d s a s " m o t h o d s that use conjugate directions," for a general r r o r r l i r r c i r r l i r n c t i o n , t h c c o n j u g a t e directions exist orrly [or l c;uadratic i t p l ' t t ' r t x i t t r i t t i t l n t l l ' l h c f u n c t i o n a t a s i n g l c stage /i. ()ncc tlrc oltjcetivc lirnction is t r o t l c l o d b y l n c w l p p r o x i r n a t i o n at stagc (/i * l), tlrc rlilr,:ctiorrs ( ) n slitgc ft 'lro r r l r l i k c l y l o b c c o t r . j t t g i t l c l o i t t t y o l ' t l t c c l i r o c t i o n s s c l c c l c r r l o n s l i r H c ( / i | |).

' -

n I t r n u , r I M t , l l t , l r \ 1 9 3

l n l i x n r n p l c 6 . 1 b c l o w w c , w i l l n c c ( l l ( ) r r r i r r i n r i z c a r l u i t c l r i t t i c t t p p r o x i t t t t t l c i n u g i v c n s c l r c h d i r c c t i o n . W l r r r t t l r i s n l c l n s i s t o c o m p u t c t h c v l l t t c l i r r , l l i t r t h c r e l r t i o n x r t r : xt + ,tsr lhlt rninirnizcs

. / ( x ) : . l ' ( * o + , i s t ; : 7 1 x r ; + v l l ( x & ) f u u + l ( , i s o ) ' H ( x r x , t s t ) ( 6 . 1 ) wherc Axt : ,tsr. To get the minimum of /(xr + ,lsft;, we diffsrcntiatc (6.3) wilh respect to I and equate the derivative to zero

with the result

df (xk + ),sk) n vr.

n : v : v , f ( * o ) s o + ( s e ) r H ( x r ) 2 s t ( 6 . 4 )

A A

vr f (xt)sft

, o P t : -

'" (so)rH1xft;st ( 6 . 5 )

EXAMPLE 6.I CALCULATION OF CONJUGATE DIRECTIONS

Suppose we want to minimize .f(x):2xt, + xtr- 3 starting at (x0)7': [l ll wilh the initial direction being so :l-4 -21r. Find a conjugate direction to the itti- tial direction so.

Solution

f4a r 0l

." : -

LrJ H(x):

Lo zl

W e n e e d t o s o l v e E q . ( 6 . 2 ) f o r s t : [sl sj]r with Q: H and su: [-4 2 l ' .

Because sl is not unique, we can pick sl : I and determine .sl f l t

[ - 1 6 - 4 ] l ^, | : 0 L'21

Thus s1 : [1 -4]T is a direction conjugate to so: [-4 -2]''.

Let us verify that we can reach the minimum of /(x) in two stagcs usirrg lirril s0 and then sr. Could we use the search directions in reverse order? Front xt'- tl l]' we could carry out a numerical search in the direction stt: [-4 ]l' to reach the point xl. But, because you must minimize exactly, Eq. (6.5) ctttt hc used to calculate analytically the overall step length because./(x) is qttlldritlic:

(-,)r4

"[; l[:;]:.

, { r 1 4 t' l

n , , t r l l

i l 2 0

: l l il 7 2

(5)

196 uNcoNsrRArNED MULTTVARTABLE oprIMIzATroN

Then

: xo + , r o . o : Irl * , j

[ - o - l - [ - o . l l r r l Lt) 721-21 | 0.44441

For the next stage, the search direction is sl : [1 -4]'.

^ r : -

and

x 2 : x r * , 1 r s r : [ - o ' l l l l ] * o . t l l f o l

L o.4444J " "1-ol : Lol

as expected.

How can one calculate conjugate directions without using derivatives?

Here is the basic concept. Refer to Fig. 6.5. Start at xo. Locate the point xo in thc direction s that is the minimum of /(x). Then, start from another point xr + xo, and locate the point xb, in the same direction s, that is the minimum of ./'(x). The vector (direction) (xb - x") is conjugate to s. Examine Fig. 6.5.

Proof of the simple procedure is as follows. We will first show that the gradient of /(x) at x" is orthogonal to the search direction s. The point x" or xb is to bc obtained by minimizing ,f(xo + ,1s) with respect to 2 assuming /(x) is a quadratic function

"f(x) : "f(xo) + vrl(xo)axo + +(axo)rH(ax0)

l " l g r r r e 6 . 1 l ) c l c t ' l t t i t t u l i o t t o l ' e o t t j t t g u l c rl i t ' c t r l i o r r n w i t h o u t c n l e u l u l i n g rl c r i v n l i v e , x

6.r DrREcr METHoDs 197

Then to get the exact minimum of /(x) in direction s analytically we use

Eq. (6.s)

, Vr/(xo)s A : -

r t u t

Suppose we write Eq. (6.3) in slightly different notation (note xrb: b?'x)

f(x): a + xrb + ]x"Hx (6.3u)

V"f(x): b + Hx and

v " f ( x o ) : b + H x o

which when introduced into Eq. (6.4) gives (with 2s: x'- xo) ( b + H x o ) r s + s " H , l . s : 0

srlb + Hxo) + srH1x" - xo) : o s r l b + H x ' ) : g

SO

srv71x1 : o (6,6)

Equation (6.6) demonstrates that if /(x) is quadratic, the gradient of /'(x) ut ltl minimum in a search direction s is orthogonal to s.

Now, referring to Fig.6.5, for the first search from xo

srv71x1 :0: sr(Hx'+ b)

and for the second search from x1:

By subtraction

s r v ; 1 x b ; : o : s r ( H x t * b )

s r H l x b - x o ) : o (6,7)

hence s and (xb - x") are conjugate. The minimum of /(x) will be found along the second search direction (x'- x') as illustrated in Fig.6.5.

6.1.6 Powell's Method

P o w e l l ' s m e t h o d ( 1 9 6 5 ) l o c a t e s t h e m i n i m u m o f a f u n c t i o n . / b y s c q u e n t i u l u n i . dimensional searchcs from an initial pclint (xo) along a sct of conjugute diroc.

tions generatcd by thc algorithm. Ncw scarch dircctions are intrrtduced nl the scarch progrcs$cs,

' l ' o clarify thc beginning o f l ' o w e l l ' s m c t h o d , l o t u s u s c a f u n c t i o n o f .iurt

t w o v u r i u b l e s . ' l ' h c s c u r c h b c g i t t s u t u t t u r h i t r u r i l y e h o s e l t p o i r r t . ( . x ' / . r i ) . ' I ' w o

(6)

,TF

l 9 t r r N r r) N s r R A r N r r ) r v l r r u v A r r l r r ! . ( ) r ' n M r / A i l n N

l i n c n r l y in d c p c n d c n t d i r c c t i o n s o l ' s c u r c h ( r ' i , r ' , l t u r c e l r o s c r r ; l i r r c x i r r n p l c , t h c c t t r t r d i t t l l c i t x c s (l l 0 _ 1 , [ 0 l ] ) a r c a c o n v c r r i c n l i r r i t i r r l c h o i c c . A u n i v a r i u t c s c i t r c h i s c i r r r i c c l o u t t o f i n d th e o p t i m a l p o i n t irr tlrc.r, coordinirtc d i r c c t i o n

x ? : x 3 + l l s i

wlrcrc 2'i is lltc optimum step length. Then a univariate search is carried out li'orn the point x'f in the x, coordinate direction to find the value of x that r n i r r i r n i z c s / ( x )

* 3 : * ? + ) , l s l

l,irstly, ()l'rc rnorc step from x! of size (xB - x3) is taken corresponding to the ovcrirll slcp on the Oth stage. Examine Fig.6.6.

ln gcncral, the kth stage of Powell's method employs nlinearly indepen- tlcnl scirrch clirections and can be considered to be initiated at x[: xf;l as l r t l l o w s :

Sfep f . l"rorn xf, determine )tkrby a unidimensional search in the sl direction so t l r r r t /(xt, t ,t,s!)is a minimum. Let 1 :5 * ),\sk. From x! determine,t! so t l r r r t /(x'i t /,s!) is a minimum. Let x\:x\+ 1 \ s \ . T h e s e a r c h is c o n t i n u e d nctlrrcnliirlly in cach direction, starting always from the last immediate point in f lrc sctlrrcncc urrtil all the ,tf, i-- 1,...,n, are determined. The search for ,i[ to r r r i r r i r r r i z c / ( x ) i n t h e d i r e c t i o n s f t - 1 i s t a k e n i n t o a c c o u n t in s t e p 4 b e l o w . Slep 2. l)owcll pointed out that the search described in step 1 can lead to lin- crrrly dcpcnclcnt search directions, when, for example, one of the components of

rli

l ' l g u r o 6 . 6 l t t i l i r t l i o t t o l l \ r w r : l l ' s t t t c l h ( x l o l o D l i t t t t r t t l i o t t .

{r-F

lr r r( )rr\ l9!l

s r b c c . r r ' r c s c s s c n t i a l l y z c r o l l c r , ' i r r r s c l t ( ) l)r'(]gr'oss w l s n u r c l c in t h l t p l r t i c u l i r r d i r c c l i o t t . ' l ' w o s c i t r c h d i r c c l i o n s lh r r s n r i g l r t b c c o m c c s s c n t i a l l y c r l l i n c i r r . l l c r r c c it r r r i g h t b c u n w i s c to r c p l e r c c t l r c o l d d i r c c t i o n w i t h a n e w o n c i f b y d o i n g s o l l r c ncw sct ol' dircctions bccamc lincarly dependent. Also, hc demonstratccl (lilr a quadratic function) that in scaling the search directions by

( s f ) r H s f : 1 i : t , . . . , n

the determinant of the matrix whose columns comprise the scarch dircctitlrrs takes on its maximum value if and only if the sf are mutually conjugatc with respect to H. He concluded that the overall direction of progress for thc /ith stage, sk, should replace an existing search direction only if the replacemcnt vcc- tor increased the determinant of the matrix of the normalized search dircctions.

for then the set of directions would be more effective. Hence, after minimizirtg ./(x) in each of the n directions as described in step I, one additional stcp of sizc (** - 5) is taken corresponding to the total progress on the kth stagc to yiclcl the point (2l - x[;. A test is then made-see step 3 below-to ascertain whcth- er or not adding the new direction and dropping an old one decreascs thc dctcr- minant of the search directions.

Step 3. Let the largest rcduction in /(x) in any search direction on thc lcth stlgc be denoted by

Aft : max {/(xf-,) - /(xf)}

i = 1 . . . . , t r

The search direction corresponding to this maximum change in ./'(x) will hc designated sl. To make the notation more compact,let J'r:,f(xh), .1'r: ,1'1x|,,1,

and f, : f(2x|- x[), where xb : x1-1 and x] : xl-r + ),!s!: xf, + Il,,tftl.

If either f.2 f, andlor (f, - 2f, + f.)(f, - f, - Ao)t > 0.54& (./; ./r)',

use exactly the same directions s!,...,s| in the (lc + l)st stage as in the /<th stagc, t h a t i s , s f * t : s f f o r i - 1 , . . . , n , a n d s t a r t a t x 6 * t : x l [or from x[t :2x!, x[: xlr, whichever x yields the lowest value of /(x)].

Step 4. If the test in step 3 is not satisfied, the direction sk from xf, to xl, is searched for the minimum of /(x), which will be used as the starting point lix the next stage, (k + l). The set of directions to be used in the (k * I )st stagc arc the same as the kth stage, with the exception of the direction sf;, which is rc- placed by sk. However. so is placed in the last column of the matrix of dircctiorrs instead of in the location of sf;. Consequently, the directions to bc r.rsccl in lltc (k + | )st stage are

[ . 1 * ' s : " s f * ' 1 :;s! s! s l , sf,*, ... sll, ((r)l

Stcp 5. A satisfactory convcrgcncc critcriorr for l)owcll's mcthod is to tcrrrri- n a t c t h c s c i r r c h l ( t h c c n d o l ' u n y s l i r g c in w l r i c l r t l r c c h u n g c in c u c h i n d c p c r r - d c n t v l r i a b l c i s l c s s i l r l n l h c r c c l u i r c t l il c c u r i l c y . r : , . l i r r I - - 1 . . . 1 . o r l o r

ll*l xf,ll .,. r;.

(7)

1{||F

2 0 0 u N ( r l N s i l t A l N l r r ) M t jl r r v A R r A l r t l , ( ) r r r r M l z n i l ( ) N

I.]XAMPLT] 6.2 APPLICATION OF POWELL'S METHOD Thc application of Powell's algorithm to Rosenbrock's function Minimize./(x): 100(x, - xl)2 + (l - x')2

illustrates what happens in the minimization of a nonquadratic function.

Step0. We start at x0:1-1.2 1.01r, where

"f(*o): 24.2.The initial search dircctions are

.e: _{' LO_l} fll .e: lol

Lrl

Step l. An initial minimization of /(x) is made using the search direction s!, that is, in the x, coordinate direction (the details of which are omitted) to reach the point x!: [-0.995 1.000]1', where /(x!):3.990. Then a search is made using the vector s!, that is, in the x, coordinate direction, yielding x!:

[ -0.995 0.990]r, where /(x!) : 3.980.

Step 2. Following a step of

x 3 : ^ [ - 0 . 9 9 5 1 [ - 1 . 2 o o l [ - o . 7 9 o l

'l o.eeo_l- | r.ooo_l: I o.e8ol

f (x\): rs.872

Ncw directions of search are computed as follows:

Steps 3 and 4

rl :19 : [?]

s j : x 9 - " [ - 0 ' e e 5 l f - l ' 2 0 0 - l - [ 0 ' 2 0 5 - l

* d : L o.eeol-l r.oool:l-o.orol

To normalize sj

tlsltt : ls)lsl - .Jqa x t0-2 :0.206 sl I : o'eee I

[-o.o+aa]

We use sj because the criteria listed in step 3 are not satisfied (Lo :24.2 - 1 . 9 9 : 20.21)

"r(*3) : 1s.872 < .f (x\ : Zq.Z

a n d

124.2 - 2(3.980) + t5.8721124.2 - 3.980 - 20.2tfz <0.5(20.2t)(24.2 - 15.872)2 (b) Thc rcsults of one additional stage of the search are given in Table 86.2, starting f r o m x l : [ - 0 . 9 9 5 0 . 9 9 0 ] ? :

Figure E6.2 illustratcs thc trajectory of thc scarch. ln total, thc objcctive f u n c t i o n w a s c v a l u a t c d I . 5 6 2 t i m c s t o r c a c h t h c m i n i m u m o f /'(x) lt x*:

l l , ( X X ) l . ( X X ) 1 7 n n d /(x*) l . 3 . 1 l t x l 0 r " r 0 .

(a)

-w

6.f r)rRn('T Mtirlr()r)s nl

Table E6.2

Stage k (s'l)' (s!)' x r , / ( x t )

[0 l] [0.20s -0.010]

[0 1] [0.453 0.892]

[0.4s3 -0.892] [0.608 -0.794]

xl: [-0.99s 0.990]r' 3.969 xl : [-0.990 0.990]? 3.95e xl : [-0.984 0.979]1 3.94t1

xf,: l-0.761 0.5401? 3.257 x l : l-0.761 0.5791? 3.101 x22: l-O.7O2 O.462ft 2.91'16 x l : [-0.503 0.203]7 2.510 xi : [-0.538 0.273]? 2.196 xj = [-0.466 0.178]? 2.301

Figure E6.2 Search trajectory for the Powell algorithm (the numbers designatc tho funOlllln evaluation sequence number).

Brent (19?3) proposed a modification to Powell's method to avoid lincttr dependence of the search directions and still retain quadratic convcrgcnce, HC r e s e t t h e s e a r c h d i r e c t i o n s s 1 , . . . , s , p e r i o d i c a l l y t o b e t h e c o l u m n s o f a n t l r t h o g ' onal matrix sclectcd so that if ./'(x) werc quadratic, thc columns would rem$ln conjugatc. Thc columns of thc orthogonal matrix wcrc thc normalizcd principul itxcs of /'(x), l{c also incorpr)nltc(l il rarxlom stcp prittr to stttrting thc unidi' m e n s i o n l l s c i l r c h . i t h c u r i s t i c tl r r t l l t c l p s in p o o r l y c t l n d i t i r l t t c d p r o b l c m s ,

562 j'

1299

(8)

'{trF

2 0 2 l N ( 1 ) N s r h A t N r , r ) M r r u v A r r A n t t , r l r , n M r / A r o N

6.1.7 Summary

I)ircct methtlds wcrc thc carlicst mcthods proposcd fur unconstruincd optimizu- tiott. 'l'hcy arc currcntly uscd in thc chcmical proccss industrics bccausc thcy urc sinrplc to undcrstand and cxccute. Howcver, they are not as cflicicnt and robust ils mttny of thc morc modern indirect methods describcd in thc scctitlns that folkrw. llut, for simple two-variable problems, direct methods arc oftcn quitc sltisfactory.

6.2 INDIRECT METHODS_FIRST ORDER

Indircct methods in contrast with the methods described in the previous section do makc use of derivatives in determining the search direction for optimization.

llowcvcr, our classification of direct vs. indirect may not be as clear-cut as it sccnrs bccause substitution of finite differences for derivatives will render the itttlircct methods "derivative-free." A good search direction should reduce (for rrrirrirnizirlion) the objective function so that if xo is the original point and x1 is l l t t ' : t t c w p o i n t

"f(xt) <

"f(xo)

Srrelr n dircction s is called a descent direction and satisfies the following retlrrirctttcrtt at any point

Vrl(x)s < 0

'f'rr scc why, examine the two vectors V"f(x*) and s& in Fig.6.7. The angle be- lwccn them is 9, hence

Vrl1x;sk : lv,f(xo)llskl cos 0

ll'():90' as in Fig.6.7, then steps along sft will not reduce (improve) the value ol' ./'(x). lf 0 < 0 < 90', no improvement is possible and /(x) increases. Only il' {) > 90" will the search direction yield smaller values of /(x), hence V ' l ' ( x t ) s t < 0 .

Wc first examine the classic method of using the gradient and then exam- ittc l conjugate gradient method.

6.2.1 Gradient Method

Itt this scction we briefly set forth the strategy of the gradient (steepest-descent/

rtsccnt) rncthod of unconstrained optimization. This method uses only the first dclivltivcs of the objective function in the calculations.

You will rccall that the gradient is the vector at a point x that gives the (krcll) clircction of the greatest increase in /(x) and is orthogonal to the contour o l ' / ( x ) a t x . F o r m a x i m i z a t i o n , t h e s e a r c h d i r e c t i o n is s i m p l y t h e g r a d i c n t (w h e n rrscd lhc algorithm is called "steepest ascent"); for minimization, the search rlircction is thc ncgativc of the gradicnt ("steepcst dcscent")

st : -V./'(xft) ( 6 . 8 )

q{-Fr

n r r N r ) r r / ( r M l i n r ( ) n s I n s r ( ) t a r ) t , t { 2 1 1 . 1

" / ( x ) - 0

"/(x) : 3

\ I I

Region of valid

search directions

l'igure 6.7 Identification of the region of possible search directions.

In steepest descent at the /cth stage, the transition from point xt t6 lnglhcr p6irrl xft*1 can be viewed as given by the following expression:

x f t + 1 - x e + Ax&: x& ,1&s: xe - )"kyf(xk) ( 6 , e )

where Ax& : vector from x& to xe+ 1

st : search direction, the direction of steepest descent lk : scalar that determines the step length in direction si

The negative of the gradient gives the direction for minimization but nol lha magpitude of the step to be taken, so that various steepest desccnt pruccdurcr are possible, depending upon the choice of ,tk. we assume that the valuc of ,/'(x) is continuously reduced. Because one step in the direction of stccpcst clcseottt will not, in general, arrive at the minimum of /(x), Eq. (6.9) must bc rpplicd repetitively until the minimum is reached. At the minimum, thc valuc of thc elements of the vector gradient will each be equal to zero.

Among the many methods of selecting lk that have appcarcd in the lircrn- ture, we will mention two. One mcthod cmploys a onc-dimcnsiontl scarclr llpttg the negative gradient to sclcct thc stcp sizc.'fhc sccond prcspccilics thc stcp sizc t o b c a c o n s t a n t valuc and uscs this virluc lirr cvcry itcration. ('lcirrly. thc rrruirr d i l l i c u l t y w i t h thc sccond itpproitclt is tltrrt ollcn wc do nol krrow ln ill.rpr()p11lc s t e p s i z c il priori. llow should it bc clrrurgctl r r s tl r c t l i r c c l i o n o l ' s l c c p c s l - { c s c c r r l

- 1 0 I $ l

\

\ I I

-vl(x")

(9)

._E

l l f 4 t l N ( ( ) N s t l t A l N l l ) M l r l l l v A l ' l A l l t l ' ( ) l ' l l M l / A l l u N

.:lrrrrrgcs 1tt{ how slror.rlcl it bc rccluccd ttl 1'ttrtrlrit c()llvcrgcllcc'l Sotttc illustrtt- l i o t t s w i l l c l a r i l y t h c p r o b l c m .

t r i r s t . l c t u s c o n s i d c r t h e w c l l - s c a l e d o b j c c t i v c f u n c t i o n , . / ' ( x ) : ' x i f r l ' wlrose contrlurs arc conccntric circles as shown in Fig. 6.8. Supptlsc wc calculato l l t c g t i r t l i c n t a t t h c p o i n t x ' t : 1 2 2 l

v/(x) :

t;:;] vf(2,2): tl H(x):

": [3

o-l

2 ) t*-1.

oor".ue that s is a vector point- l'lrc tlircction of steepest descent i. t : -

[+_1

irrg towilrd the optimum at (0,0). In facil fhe gradient at any point along a ..,,'t.,r,r of this objective function will pass through the origin (the optimum)'

( ) r r t h c o t h e r h a n d , f o r f u n c t i o n s n o t S o n i c e l y s c a l e d a n d w h i c h h a v e n o n . zr.r.o olltdiagonal terms in the Hessian matrix (corresponding to interaction Ir.r.rrrs srrch as xrxr), then the negative gradient direction is unlikely to pass llrrorrglr thc optimum. Figure 6.9 illustrates the contours of a quadratic function ,,1 lw;) virriabies which includes an interaction term. Observe that contours are Irltt.rl with rcspect to axes. Interaction terms plus poor scaling corresponding to nlrn.ow virllcys, or ridges, cause the gradient method to exhibit poor perfor- nuurec irttd slow convergence.

ll /(x) is optimizJd exactly, as indicated by Eq'(6'6)' successive steps of tlrc stccpcst-descent method are orthogonal to each other illustrated in Fig'6'9'

-vrz.2t: -L.l I-41

_ , 7 3 ) : [ - 1 l l ]

l , l g r r r e 6 , t l ( i t i t t l i c r r t v c c l o r l i r r / ( r ) \ i I \ i

4 . [ J s c t h c r c l l t i t l n

x t ( i l x * | , t r s r

" -

n . ' t N r ) i l i l ( | Mt ilt()t)s ilt(fit ()til)t H 2 l f S

l.igure 6.9 Steepest-descent method for a general quadratic function.

'Ihis seemingly peculiar result occurs for any /(x) because the derivative of /(x) along any line s(,1) by the chain rule of calculus is

'!^:+ !^*'tt't f. :+s'{ : s'vr

At the end of the search we want df ldx,:0 so that jIV.f(xk* t): O. In practice, it is not desirable to optimize too accurately. while the gradient method may yield satisfactory progress in reducing ,f(x) in early iterations, it slows down significantly in later iterations because of the short step sizes taken.

The steepest-descent algorithm can be summarized in the following steps:

l. Choose an initial or starting point xo. Thereafter at the point xft:

2. Calculate (analytically or numerically) the partial derivatives

-1. Calculate thc search vector A f ( x l

-

i : 1 . . . . , n

oxj

s : - V l ( x f t )

(10)

{-

2 l ) O t r N ( t, N s t t t A l N t r r M r r r 1 v A r . r A 1 r r ' , r ' l r M r / A 1 r l N

I r l o h l l i r r t l r c v l l u c r l l ' x r r r . ' l o g c t ,l.r usc lit1. (6.5) or minimizc./'(x) numcri- c u l l y . ls t l c s c r i b c d i n c h a p . 5 . N o t c that in tlrc "fixcd-stcp" gradicnt mcthod, ()nc sols )* : 7, a constant, for all iterations.

5. ('rrnparc./'(x') with./'(xr); if the change in /(x) is smaller than some toler- irflcc, stop. If not, return to step 2 and set k : k + 1. Termination can also be spccilicd by setting a tolerance on the size of )"0 (^o < y) or by specifying some Iolcrnncc on the norm of V"f().

A proccdure of strictly steepest descent can terminate at any type of sta- lirrrrirry point, i.e., at a point where the elements of the gradient of /(x) are zero.

'l'lrus you must ascertain if the presumed minimum is indeed a local minimum (i.c.. l solution) or a saddle point. If it is a saddle point, it is necessary to crrrl'rloy a nongradient method to move away from the point, after which the rrrinirnization may continue as before. The stationary point may be tested by cxirrrrirring the Hessian matrix of the objective function as described in Chap. 4.

ll tlrc llcssian matrix is not positive definite, the stationary point is a saddle

;rrrirrl. l)cr(urbation from the stationary point followed by optimizatton should l c r r t l lo x * i l ' x * i s a m i n i m u m ( o r a maximum).

'l'lrc birsic difficulty with steepest descent is that the method is too sensitive lo thc sclrling of ./(x), so that convergence is very slow and what amounts to oscillirtion in thc x space can easily occur. For these reasons steepest descent/

rrsccrrl is not a very effective optimization technique, hence we turn to some o l l t e r i n d i r c c t m e t h o d s .

6.2.2 Conjugate Gradient

'f'lris rrrethod was devised by Fletcher and Reeves (1964), and if /(x) is quadratic rrrrtl is ttlinimized exactly in each search direction it has the desirable features of r;ttirtf lirlic convergence frefer to Eq. (5.3) with p : 2] and of using conjugate tlircctions (rcfcr to sec.6.1.5). The method represents a major improve- nrcnt ()vcr steepest descent with only a marginal increase in computational cllrrt. 'l'hc conjugate-gradient method essentially combines current information ttlrottl lltc grtrdient vector with that of gradient vectors from previous iterations (ir rrrr:rrrory fcature) to obtain the new search direction. what you do is to com- lrrrtc llrc scarch direction by a linear combination of the current gradient and llrr prcvious search direction. The main advantage of this method is that it rcrlrrilcs orrly a small amount of information to be stored on each stage of calcu- Irrlrorr. irrrd thus can be executed on small computers or even calculators. The l o r n p r r l t r t i o n s t c p s a r e l i s t e d b e l o w .

S t e p l . A t xo calculatc /(xo). Let

s" : Y/ (x")

6 . 1 l N l ) l R l l ( 1 M I i t H ( ) l ) s I r l R s t ( ) R l ) l 1 R 2 l l 7

Step 2. Save V/'(xo) and compute

x l : x o + l , o s o by minimizing

"f(x) with respect to 2 in the so direction (i.e., carry out a unidi- mensional search for ,to).

Step 3. Calculate /(x1), V/(xl). The new search direction is a linear combinit- tion of so and V/(xl):

^ v T ( x r ) V / ( x t )

s , : _ V / ( x , ) + s o V f f i t , f i

For the kth iteration the relation is

s & + 1 - - v l ( * * * r ) + . u v 7 q ( f t + l ) v / ( x k + r )

vrf g\vf (xk) ^{( 6 .} ^{| 0 )}

For a quadratic function it can be shown that these successive search directions are conjugate. After n iterations (k : n), the procedure cycles again with x" ' I becoming xo.

Step 4. Test for convergence to the minimum of /(x). If convergence is nrtt attained, return to step 3.

Step n. Terminate the algorithm when ll(sk)ll is less than some prescribcd toler- ance.

Note that if the ratio of the inner products of the gradients from stugc k + 1 relative to stage k is very small, the conjugate-gradient method will bchuvc much like the steepest-descent method. Some of the same difficultics that occur with Powell's method arise also with conjugate gradients; the linear dcpcnclcncc of search directions can be resolved by periodically restarting thc coniugatc- gradient method with a steepest-descent search (step I above). The proof thitt Eq. (6.10) given above yields conjugate directions and quadratic convcrgcncc has been given by Fletcher and Reeves (1964).

EXAMPLE 6.3 APPLICATION OF THE FLETCHER-REEVES CONJ UGATE-GRADIENT ALGORITHM

We will solve the same problem as solved in Examplc 6.2 (Roscnbrock's lirttctiott) M i n i m i z c , / ( x ) : I 0 0 ( . r ' : - - - . r i ) t + 1l - .r,)2

starting at x(t)) : [ - 1.2 1.0'lr. Thc lilst l'cw stirgcs of' thc lrlctchcr Rccvcs procc"

c l u r c a r c l i s l c t l in ' l ' l h l e l i 6 . . l r r s i r r g l l r c s r r r n c l i n c s c r r l c l r u s i n l i x l r l p l c 6 . 1 . ' l ' l t c

(11)

.ilr

2 0 l l r r N ( r l N s n r A r N t , r , M r r l r r v A R r A n t , r ) r ' r M r / A r r r N

'l'rblc 86.3 Rcsults for i)xumple 6.3 using the F'lctchcr-Reeves method

llrrrtftln

N o , o f function

crlls / ( x ) ^{x l}

f/ (x)

x2 0r,

l / ( x ) i x , ( )

I J . l 5 t ( ) l 5 .1( ) l 5

l 0 l {

I 7 9 t 4 28 4 l ) t 69 80 90

24.2 4.377945 4 . 1 2 7 4 2 2 3.947832 3.t65142 1.247687 0.5566r 2 o.147607 o.024667 0.0000628 1 . 6 1 7 x l 0 1 s

t a

- t.050203 - 1.029913 - 0.963870 0.77'7190 -0.079213

0.254058 0.647165 0.843083 0.995000 1.000000

1 . 0 1 . 0 6 1 1 4 1 1.069013 0.898872 0.612212 -o.025322

0.063189 0.4036r9 0 . 7 1 0 1 1 9 0.989410 1.Unm0

- 2 1 5 . 6 - 2 1 . 6 5

-0.6437 - 1 5 . 5 6

- 1.002 -3.071 - 1 . 3 5 4

3.230 - 0.088 I

0.2348 - 1 . 6 0 x l 0 8

- 8ti.00 - 8.357

I . 6 5 8 -6.034 - 1 . 6 4 1 5 - 5.761

- v . z I I

- 3.040 - 0.1 339 - 0.1 230

- 3 . 1 2 x 1 0 - 8

trajectory is similar to that illustrated in Fig. E6.2 but uses fewer function evalua- tions.

For additional details concerning the application of conjugate-gradient nlcthods, especially to large-scale and sparse problems, refer to Buckley (1978), Shanno (1978), Fletcher (1980), Hestenes (1980), Gill, Murray, and Wright (l9tll), Dembo, et al. (1982), and Griewank and Toint (1982).

6.3 INDIRECT METHODS_

SII(]OND ORDER

l"rortt one viewpoint the search direction of steepest descent can be interpreted rrs bcing orthogonal to a linear approximation (tangent to) the objective func- lion tt point x*; examine Fig. 6.10a. Now suppose we make a quadratic approx- i t n l l i o n o f /(x) at xe

.l'(x) x f (*o) + vT(x&)axe + +(Ax&)rH(x&;axft ( 6 . 1 1 ) wlrcrc ll(xe) is the Hessian matrix of /(x) defined in Chap. 4 (the matrix of sccorrd partial derivatives with respect to x evaluated at xft;. Then it will be possibk: to take into account the curvatr'.e of /(x) at x& in determining a search tlilcetion as described next for Newton's method and its analoss.

6..1.1 Newton's Method

Ncwlon's mcthod milkes usc of thc second-order (quadratic) approximation ol' / ( x ) a t x r , a n d t h u s c m p l o y s s c c o n d - o r d e r i n [ o r m a t i o n a b o u l ./'(x), that is, infor- r r r r r l i o n o b t a i n c d l' r r l m th c s c c o n d p i r r l i n l d o r i v a t i v c s o l ' / ( x ) w i l h r c s p c c t to t h c

. , I r N n r R t l ( ' r ' M l f u r ( ) r ) t i s r ( t ) N r ) o n l ) t l t 2 0 9

(a) Steepest descent: first-order approximation (linearization) ofl(x) at xt

s -[V](xt)l - rV/(xr)

(D) Newton's method : second-order (quadratic) approximation of /(x) at xt

Quadratic approximation ofl(x)

Figure 6.10 Clomptriron of steepest-descent with Ncwkrtt'r method from thc vicwpoinl rrf objective function upprorilrru- tion.

( 6 , | 2 )

( 6 . l . l 1 independent variables. Thus, it is possible to take into account the curvature ttf f(x) at x& and identify better search directions than can be obtained viu tho gradient method. Examine Fig. 6.10b.

The minimum of /(x) in the direction of xe is obtained by diflercntiuting the quadratic approximation of /(x) with respect to each of the componenti of x and equating the resulting expressions to zero to give

v/(x): V/(x*) + H(xk)ax&: o

x & + r _ xt: Axk: _[H(*o)]-tvl(**)

where [H(x*)]-t it the inverse of the Hessian matrix H(xt). Equation ((r.l.l) reduces to Eq. (5.5) for a one-dimensional scarch.

Note that both the direction 4,n/ stcp lcnglh urc spccified as a rcsult ol'1,)q.

( 6 . 1 2 ) . l f ./'(x) is actually quadratic. only ()nc stcp is rcquircd to rcach thc nrirri-

m u m o f ./'(x). Howcvcr, lirr u gcncrul nonlittcrrl objcctivc lirnction. thc lrrirrirnurrr

(12)

2 1 0 r , N ( r ) N s t R A r N l r ) M u r . r ' r v A R r A u . r i ( ) r r l l M r z l r ' r ( ) N

of /(x) will not bc rcached in one step, so that Eq. (6.13) can be modificd to conform to Eq. (6.9) by introducing the parameter for the step length into (6.13).

x & + r - x k : - i k l H ( x ) l - 1 v / ( x ) ( 6 . 1 4 ) ()bscrvc that the search direction s is now given (for minimization) by

s k : - [ H ( x u ) ] - ' Y f ( x o ) ( 6 . 1 5 ) and lhat the step length is 2e. The step length )k can be evaluated numerically as rlcscribed in Chap.5 or analytically via Eq. (6.5). Equation (6.14) is applied itcrativcly until some termination criteria are satisfied. For Newton's method, ,L : I on each step.

Also note that to evaluate Ax in Eq. (6.13), a matrix inversion is not neces- slrily required. You can take its precursor, Eq. (6.12), and solve the following scl tlf linear equations for Ax*

H(x&)Axk: -Vl(*u) (6.16)

rr proccclurc that often leads to less round-off error than calculating s via an irrvcrsion of a matrix.

]:XAMPLE 6.4 APPLICATION OF NEWTON'S METHOD TO A CONVEX QUADRATIC FUNCTION

Wc will minimize the function

f ( x ) : 4 x l + x l - 2 x r x ,

s t a r t i n g a t x o : [| l l t

w i t h , l : l .

t 8 x , - 2 x , l

V / r x y : l r i , _ r i , )

H(.): [-: -1)

" '(,.): [: N

A x o : - H - , Y / ( * o ) :

t i ilt:t:t-il

x , : x: x o . " : [ | l- ' [ _ i l : Il]l

/ , ( x * ) ., . 0 Ircrrcc.

6.3 rNDrREcr METHoDs-sEcoND oRDER 2ll

Figure E6.4

Instead of taking the inverse of H, we could solve Eq. (6.16)

which gives

as before. The search direction so: -H t V/(ro) is shown in Fig. E6.4, EXAMPLE 6.5 APPLICATION OF NEWTON'S METHOD

AND QUADRATTC CONVERGENCE If we minimize the nonquadratic function

f (x) -_ (x, - 2)n * (xr _ 2y2x] + (x, _r 1)2

from the starting point of (1,0), can you show that Newton's mcthocl cxhihllr quadratic convergence? l/rnl., Show that

i_: -:lli:\l: -t:l

A x f : - 1 A x ! : - l

l l x r i r - 1 * 1 1

ll xr x* ll I < r' (scc Scc' 5"1)

f G ) : a x ? + x ] - 2 x , x

(13)

2 1 2 u N ( ' ( ) N s T R A T N F T ) M U L T T v A R T A B L E o p r r M r z A T r o N

- r.0 0 t.0 2.0 3.0 4.0 5.0

x l

l,'iguro t)6.5

Solution. Newton's method produces the following sequences of values for xyx2, lrrtl [./(xr+t) -,f(x*)] (you should try to verify the calculations shown below; the llnjcctory is traced in Fig. E6.5).

Iteration

^ 1 x2

"f(x*t t) - .f(**) 0

I 2 4 5 6

1.000000 r.000000 1.391304 1.745944 t.986278 1.998734 1.9999996

1.000000 6.0m

-0.5000m 1.500

-0.695652 4.09 x l0-' -0.948798 6.49 x l0 2 -1.(X8208 2.53 x l0 3

* 1 .m01 70 1.63 x l0 "

- 1.0m002 2.75 x l0- t2

You can calculatc bclwccn itcrations 2 and 3 that r' : 0.55; and bctwccn 3 rrrrtf 4 thlt c =,0.74. llcnco. quadratic convcrgcncc can bc clcmonstrltcd numcri- cil lly.

,.r rNDrREcr METHoDs sEcoND oRDER 213

Newton's method is the fastest method of minimization (or maximization) when it succeeds. But it has the disadvantages of:

1. The method does not necessarily find the global solution if multiple local solutions exist, but this is a characteristic of all the methods described in this chapter.

2. It requires matrix inversion or else the solution of a set of n linear equations, 3. It requires both analytical first and second partial derivatives which may not

be practical to obtain.

4. The method may proceed to a saddle point if H(x) is not positive definitc.

Difficulty 3 can be ameliorated by using (properly) finite difference approxima- tion as substitutes for derivatives (see Sec. 6.5). Global convergence can be cn- hanced by

(c) Back-tracking if a full Newton step is unsatisfactory because it is too long and /(x) does not decrease, and/or

(b) Use of trust regions, that is, estimating a region in which the local quadrutic model can be trusted to bc an adequate representation of the objective func- tion, and taking a step to minimize the model in this region. See Sec. 6.3,2, To avoid difficulty 4, you must be very careful to use a techniquo that continuously guarantees a positive definite H(xt) or inverse H-11x*;. Somc rtsn.

dard digital computer programs for matrix inversion or solving sets of lino$r equations may cause the condition number of H or H-1 to burgeon and honco may be unsatisfactory. The value of the objective function is guarantccd to bo improved on each cycle only if the Hessian matrix of the objectivc function lr positive definite. H(x) is positive definite for strictly convex functions, but fcrr general functions H(x) changes from point to point and Newton's mcthod muy lead to search directions which have few or no components that point in a direction that reduces /(x). Recall that a real symmetric matrix is positivc dcfl.

nite if and only if its eigenvalues are positive. Consequently, in minimization, wo observe from Figs. 4.12 to 4.16 that when the eigenvalues of H(xt) arc ull porl.

tive, the quadratic approximation corresponds locally to a circular or clliptieul hollow having a minimum. On the other hand, if a pair of cigcnvtlucs htvo opposite signs, we observe from Fig. 4.14that a saddle exists that clocs lrol huvo a local minimum. In this latter case the direction given by Newton's mcthod will indicate that the best search direction is toward the saddlc point instcld ol'rrwuy f r o m i t i n t h c " d o w n h i l l " d i r c c t i o n .

N c w t o n ' s m c t h o d d o c s h a v c t h c distinct advantagc of quadrllic c()nver.

g e n c c ( s c c S c c . 5.3) in thc vicinity of lhc minimum of thc ob.icctivc l i r r r e t i o n

i f H ( x ) i s p o s i t i v c d c l i n i t o a n d if thc ob.jcctivc f u n c t i o n c a n b c n p p r o x i r n u l e d

r e a s o n a b l y w c l l h y a q u i t d r i t t i c f u n c l i o n . tritnrcly llrc rcgion in wlriclr lrrrxl

d c s c c n l m c l h o d s p c r f ( ) r n l l l t c p o o r c s l . l i i r r irwiry l'rorrr llrc rrrirrirrrrrrrr, o l h e r

m c t h o r l s ll l u y l c l l l p ( ) r i r r i l y c o r r v e l g c l i r s l c r ll r r r r r N c w l o r r ' s rr r c t l r o r l .

(14)

214 uNcoNsTRAINED MULTTvARTABLE oprrMrzATroN

We next consider what to do if the Hessian matrix of the quadratic ap- proximating model is not locally positive definite (for a minimum).

6.3.2 Forcing the Hessian Matrix to be Positive Definite

Marquardt (1963), Levenberg (1.944), and others have suggested that the Hes- sian matrix of /(x) be modified on each stage of the search as needed to ensure that the modified H(x), fr(x), is positive definite and well-conditioned. The pro- ccdure adds elements to the diagonal elements of H(x)

ff('): iH(x) + frl (6.r7)

where B is a positive constant large enough to make fr(x) positive definite when H(x) is not. Also it is possible to use

t f r ( * ) l - 1 : [ H - 1 ( x ) + y I ] ( 6 . 1 8 ) where y is scalar of sufficient value for the same purpose. To ascertain the value of / to use, you might estimate the smallest (most negative) eigenvalue of H(x)

Table 6.1 A modified Marquardt method

Step 1

Pick xo the starting point. Let s : convergence criterion.

Step 2

S e t k : 0 . L e t Bo : 103.

Step 3

Calculate V/(x&).

Step 4

Is llV/(xt)l < e? If yes, terminate. If no, continue.

Step 5

Calculate s* : -[Ho + [Jklf tvf (xk).

S t e p 6

( l a l o u l a t e xr+r : xt + ,l.tsk.

Str.:p 7

I s . / 1 x r r r ) < , / ( x * ) ? I f y e s , g o t o s t e p 8 . l f n o , g o t o s t c p 9

Stcp tt S c t /tr I

l / r n n d A . " / i + l . ( i o to stcp 3.

Stcp t)

S o f /lr - 1.llt, (h lo rlcp 3,

3 rNDrREcr METHoDS sECoND oRDER 215

and make B > -min {ar}, where a, is an eigenvalue of H(x). Note that with a p sufficiently large, Bl can overwhelm H(x) and the minimization approaches a steepest-descent search.

A simpler procedure that may result in a suitable value of B is to apply the Gill and Murray modified Cholesky factorization as follows:

H ( x o ) + D : L L r (6. | 9)

where D is a diagonal matrix with nonnegative elements [d,, :0 if H(xt) is positive definite] and L is a lower triangular matrix. Upper bounds on the elc- ments in D are calculated using the Gershgorin circle theorem [see Dennis and Schnabel (1983) for detailsl.

A simple algorithm based on an arbitrary adjustment of B @ modificd Marquardt's method) is listed in Table 6.1.

EXAMPLE 6.6 APPLICATION OF MARQUARDT'S METHOD

The algorithm listed in Table 6.1 is to be applied to Rosenbrock's function ./'(x) - 1 0 0 ( x , - x?)t +(t - x,)2 starting a t x 0 : l - 1 . 2 1 . 0 1 r w i t h H o : H ( x o ) .

A quadratic interpolation subroutine was used to minimize in each senreh direction. Table E6.6 lists the values of ,f(x), x, v/(x), and the elements of [H(x) * BI]-1 for each stage of the minimization. A total of 96 function evaluations nnd l6 calls to the gradient evaluation subroutine were needed, and the CPU time on e Cyber 170 computer was 19 milli-seconds.

l'able E6.6 Marquardt's method

Elements of ltl(x)t1 .+ ll | |

/ ( x ) ^{x l}

" 1 1

af 6)

o x t

af 8)

^

o x z ^{f r t z '} li, ,' f ' J

r,l,l(x)0 - 1.2000 . t . t 4 9 8 * 1.0315 . i l t73 - 1 . 0 2 8 9

\ t)642 -09412

| 1776 *0.8542 , t527 -0.6028

| () l.t2 -0.3167

| | 1 1 9 0 - 0 . 0 3 t 3 f r(rlJt{5 0.2279 r) r.166 0.4570 0 L)75 0.6u46 il 0.1 17 0.ti705 ll(l(XX) O.9t{70 r l r x x x ) 0 . 9 9 7 4 0 (x|(x) 0.9999 0 0(x)o l.()()()() i l 0 r | ( x t l . ( ) ( x ) ( )

- 215.6000 2.1844 - 5.2448 12.7861 - r 0 . 5 0 3 1 - I 3 . 5 3 9 1 -7.9993 -- 2.5059

| 1 1 4 1

3.2402 3.9595 2.6299 0.77(X) 0.05u9

0.(xx)4 0,(xxx)

- 88.0000 0.0005 3.0284 0.0005 -0.5768 0.0014

8.8552 0.0037 -3.9772 0.0195 -8.5706 0.0399 8.4706 0.0464 - 7.0832 0.0519 - 6.07 59 0.06 | 6

*4.5160 0.0706 3.3523 0.0906 , | .6593 0. I l4lr

0.40-t.r 0. It{80 0.0269 0.156.r o.(xx)2 0..5 rl3 0.(xxx) 0.5(x)l

0,(xx)2 0,(xl0r, 0.(x){)2 0,(xng 0.(X)ll (1,(n!.1 0.(x)59 0,01 10 0,0.14 | 0,0641 0,(Xr(t9 0,1 170 0.0557 (f,07 t B 0.032lt (),0251 0,(x).19 0,(xl.t1 0,0:l:t 0.0l116 O.Olt(r | (l,lll{6l 0,157.r 0,:20=l 0.]]7.1 0,574ta 0,70,1 l t..lst2

t.0:4e :,0494

| ,0(x)| l,(xlsal

1.0000 1.0791 1.0557 0.9301 0.7098 0.3206 0.0580 - 0.0344 0.021 5 0.203 I 0.4520 0.1495 0.972 | 0.9949 0.9999 l , o ( x x ) I . ( X X X )

-0.0002 -0.0002

- 0 . 0 0 r 3

-0.0059 - 0.034 | - 0.0669 - 0.0557 - 0.032tt - 0.0039 0.0322 0.0tt6 | 0. |57.1 0..r271 0.70.1:t 1 . 0 2 4 9

| .( x x ) |

(15)

216 uNcoNsTRAINED MULTIvARIABLE oPTIMIZATIoN

6.3.3 Movement in the Search Direction

Up to this point we have focused on calculating H, H-t, fi, or fr-1 from which the search direction s can be ascertained via Eq. (6.15) or Ax from Eq. (6.16) (for minimization). In this section we discuss briefly how far to proceed in the search direction, i.e., select a step length, for a general function /(x). If Ax is cafculated from Eq. (6.13) or (6.16), ).: I and the step is a Newton step. If 1 + l, then any procedure can be used to calculate 1 as discussed in chap' 5.

t,lNE SEARCH. The oldest and simplest method of calculating l to obtain Ax is via a unidimensional line search. In a given direction that reduces /(x), take a stcp, or sequence of steps yielding an overall step, that reduces /(x) to some acceptable d"g."". This operation can be carried out by any of the one-dimen- sional search iechniques described in Chap. 5. Early investigators always mini- mized /(x) as accurately as possible in a search direction s, but subsequent cxperience, and to some extent theoretical results, have indicated that such a concept is invalid. Good algorithms first calculate a full Newton step (,1: 1) to gct xi*l, and if /(x&) is not reduced, back-track in some systematic way toward i'. F'uilur" to take the full Newton step in the first iteration leads to loss of the tdvantages of Newton's method near the minimum, where convergence is slow' 'to avoid very small decreases in /(x) values relative to the length of the steps takcn, most algorithms require that the average rate of descent from xk to xe+1 bc at least some prescribed fraction of the initial rate of descent in the search tlircction. Mathematically this means (Armijo, 1966)

,f(xo +,1se) < /(xk) + d,lvrl(xk)s& (6.20) lixlmine Fig.6.11. In practice a is often chosen to be very small, about 10-4, so lhlt just a small decrease in the function Yalue occurs'

Back-tracking can be accomplished in any of the ways outlined in Chap. 5 but with the objective of locating an xk1 for which,f(u*t) <f(tr) but moving

\

t , ' l g l r c 6 . l l R l r r g c o l l ( , c c l t l u b l c v u l u c s li r r e l t o i c c o l ' , l r l o l l l c c l c r l l c r l o t l ( 6 . 1 0 ) w i l h a ( ) . 0 1 ' - f l ( x t ) + d i v y ( x r ) s r

.,r(x^) + 1v7(xr)3r

] rNDrREcr METHoDs-sEcoND oRDER 217

as far as possible in the direction s& from x&. The minimum of /(x& + 2se) does not have to be found exactly. As an example of one procedure, at xft where I : 0, you know two pieces of information about /(xt + ,1s&): the values of /(xk) and V}(xk)st. After the Newton step (2 : 1) you know the value of f(*r + se). From these three pieces of information you can make a quadratic interpolation to get the value i where the objective function /(,1) has a mini- m u m :

t Vr/(xt)sr

( 6 , 2 | ) 2lf(xo +s&)- f(xo)- Vrl(xk)sftl

After ,1 is obtained, if additional back-tracking is needed, cubic interpolation cun be carried out. It is a good idea if i is too small, say i < 0.1, try A:0.1 instcud,

TRUST REGIONS. The name trust region refers to the region in which tho quadratic model can be "trusted" to represent /(x) reasonably well. In thc uni.

dimensional line search, the search direction is retained but the step lcngth ir rbduced if the Newton step proves to be unsatisfactory. In the trust rcgion up.

proach, a shorter step length is selected and then the search direction dctcr.

mined. We can only sketch the basic concept here without going into detnilr because many heuristic factors are introduced into specific algorithms to muko them effective. Refer to the following references for details: Powell (19704,h), Fletcher (1980), Morb (1977). Gander (1981), Gay (1981), and Sorenson (19t2),

The concept of the trust region approach is to estimate thc lcngth of a maximal successful step from xe. In other words, ll xll < p, the bound on lhe step. Figure 6.12 shows /(x), the quadratic model of /(x), and thc derirocl trurt region. First, an initial estimate of p or the step bound has to bc detcrminod, lf knowledge about the problem does not help, Powell suggested using tho dl:.

tance to the minimizer of the quadratic model of /(x) in the direction of ntoopott descent from xftn the so-called Cauchy point. Next, some curve or piocowlre linear function is determined with an initial direction of steepest dcsccnt ro that the tentative point xft+1 lies on the curve and is less than p. Figurc 6. 12 nhowr:

as a straight line of one segment. The trust region is updated, and thc rtoquono€

is continued. Heuistic parameters are usually required such as minirnum und maximum step lengths, scaling s, and so forth.

6.3.4 Termination

No singlc stopping critcricln will sullicc lcrr Ncwlon's mclhocl or lllty ol' the o p t i m i z a t i o n m c t h o d s d c s c r i b c d i n t h i s c h a p t c r . T h c f o l l o w i n g s i m u l t u n c o u r c r l . l c r i a a r c r c c o m m c n d o c l t o a v o i d s c l l i n g p r o b l c m s :

l ( x t ' t ) / ( x * ) l

/ . ( x r ) l'',,' ^(6,22u1

(16)

,*

2 t t l N ( r r N s i l r A r N l / t ) M l l n v A n r A n t r , ( ) r ' l r M r . r A u o N

Itllun, 6. | 2 llcprcscntation of the trust region to select the step length. Solid lines are contours of /(r). l)nlht:tl lincs lrc contours of the convex quadratic approximation of/(x) at xt. Dotted circle is thc lr'usl rcgion boundary in which d is the step length. xo is the minimum of the quadratic model l i r w l r i t : h 1 l 1 x ; i s p o s i t i v e d e f i n i t e .

o r i l s ,/'(xt) --+ 0,

o t ' t t s . r l + 0

(6.22b) (6.23a)

(6.23b) (6.24a)

(6.24b)

6..1.5 Summary of Newton's Method

f lrc lrirsic Ncwton algorithm (or quasi-Newton if )" * | or l i r l k r w s :

Slrp l). Sclccl x('.

l f ( * o * ' ) - f ( x o ) l < e ,

l?l .,.

l * f * t - x l l < e a l l s * * t ll < e ,

llV/(xt) ll < eu

if ff is used) is as

-TF' . , , 1 s l r ( A N I M l ' l t l ( l l ) l l 2 1 9

Stcp k

( l ) ( ' a l c u l a t c / ' ( x r ) , V / ' ( x t ) , H ( x t ) lor fllxryl or H - ' 1 x t ;

l o r f f '(x*)1.

( 2 a ) C l a l c u l a t c s r : - H ' ( * ' ) V / ( x * ) a n d a t e n t a t i v e A i t : , t r s r , w h c r c i* : l .

O T

(2b) Calculate A*k by solving the linear equations H(xt)Ait: -Yl'(xt).

(3) Calculatc a tentative ifr+1 - x& + Ai&. If /(it'*t) >./(**) or,tfr is too llrgc.

reduce the step length as described in Sec.6.3.3. until ie*r is acccptablc.

S t e p k + 1

( 1) Calculate .f(** * t), V/(o t).

(2) Test for convergence as indicated in Sec. 6.3.4. lf convergence is nol achieved, return to step k.

6.3.6 Relation Between Conjugate Gradient Methods and Quasi-Newton Methods

Toint (1984) has demonstrated that the connection between the conjugate grudi.

ent method and the quasi-Newton methods is as follows. Suppose that thc V/(x*) is "preconditioned" by multiplying by Hk using an analog of Eq. (6,10)

( 6 , 2 5 ) where

and Agk :Yf(xk* t) - V/(*o).

After some algebraic manipulations it can be shown for a quadratic function that

s & + 1 - - ( f i ) - t v / ( * o )

where ff is calculated by Eq. (6.28) (a formula to be discussed in thc ncxt $cc- tion). Thus, a preconditioned conjugate gradient method gives the samc sctrch directions as the BFGS update for a quadratic function. For a nonlinear func.

tion this conclusion does not apply, of course, but preconditioning improves thc performance of conjugate gradient methods.

6.4 SECANT METHODS

By analogy with the secant tcchnique for functions of a singlc variablc, thc pro- c e d u r e d e s c r i b e d i n t h i s s e c t i o n m i n i m i z c s /(x) using only valucs of ./'(x) nrrd V / ' ( x ) ; t h c H c s s i a n m a t r i x o f ./'(x) is irpproximltcd from combinations of thc t w o . O f c o u r s c V / ' ( x ) m a y h c c v u l u r t l c d b y l i r r i t c dillcrcncc substitutcs hcncc o n l y f u n c t i o n v a l u c s w o u l d h c c m p k r y c d it t t t r i r r i r r r i z l l i o n hut thc sccurrt i t s c l l ' i n

sft*l : -[HkV/(xft)] + ?&sr

-tt.\ -V]1xk;lge r\^/ -

(se)rage