WITH CONSTRAINTS

(1)

302 309 3r9 322 334 342 358 361 363 364 372 373 379 8.1

8.2 83 E 4 8.5 8.6 8.7 8.8 8.9 8.10 8.11

CHAPTER

NONLINEAR PROGRAMMING WITH CONSTRAINTS

The Lagrange Multiplier Method

Necessary and Sufficient Conditions for a Local Minimum Quadratic Programming

The Generalized Reduced-Gradient Method

Penalty Function and Augmented Lagrangian Methods Sequential (Successive, Recursive) Quadratic Programming Random Search Methods

Comparative Evaluation of General Nonlinear Programming Codes Successive Linear Programming

Optimization of Dynamic Processes

Diagnosis of the Failure of Optimization Codes to Solve Problems References

Problems

2D

(2)

298 LINEAR PROGRAMMING AND APPLICATIONS

The Hogshooter plant, located in the United States, has a range of operation from 120 x 106 to 200 x 106 lblyear. The variable cost structure is rather compli- cated due to the effects of extreme reaction conditions, separation tower limitation, and several by-products, which are affected by environmental considerations. These considerations cause a discontinuity in the incremental variable cost curve at 140 x

106 lb-year as given by the equations below:

vc:3.9 + (x - 120X0.005) 120 < x < 140 VC:4.6 + (x - 140X0.01) 140 < x < 200

There are three main customers for the DAB, located in the Europe (C1), the Far East (C2), and the United States (C3), respectively. The matrix below shows the transportation costs of (l/b) and total demand to the customers (Cl, C2, C3) with plant locations denoted as z{1 (Frag), .42 (Swung-Lo), and .43 (Hogshooter). The closest pairing geographically is ,41-C1; A2-C2; and A3-C3.

Total demand

A3

A I

C I

c2 C3

0.2 0.7 0.6

o.7 0.6 140 0.3 0.8 100 0.8 0.2 170

Use an iterative method based on successive linearization of the objective function to determine the optimum distribution plan for the product, DAB. Use an LP code to minimize total cost at each iteration.

(3)

300 NoNLINEAR pRocRAMMTNG wrrH coNsrRArNTS

In Chap. 1, you saw some examples of the constraints that occur in optimization problems. In any problem, constraints should be identified either as being inequality constraints or equality constraints, and linear or nonlinear.

Chapter 7 described the Simplex method for solving problems with linear objective functions subject to linear constraints. This chapter treats more difficult problems involving minimization (or maximization) of a nonlinear objective function subject to linear andfor nonlinear constraints:

Minimize f (x) x : [x,. x2 xn]'

S u b j e c t t o h , ( x ) : 0 j : 1,2,...,m ( 8 . 1 ) g 1 ( x ) > 0 i : r t t i - l , ' . . , P

The inequality constraints in problem (8.1) are frequently transformed into equality constraints as explained in the introduction to Sec. 8.4 so that we focus first on problems involving only equality constraints.

One method of handling just one or two linear or nonlinear equality constraints is to solve explicitly for one variable and eliminate that variable from the problem formulation by substitution for it in all the functions and equations in the problem. In many problems elimination of a single equality constraint will often be a superior method to an approach in which the constraint is retained and some constrained optimization procedure is executed. For example, suppose you want to minimize the following objective function which is subject to a single equality constraint

Minimize f (x): 4xl + 5xl $.2a)

Subject to 2x1 I 3xr: $ (8.2b)

Either xr or xz can be eliminated without difficulty. Let us solve for x, 6 - 3 x ,

(8.3)

^ 1 -

and then eliminate it in Eq. (8.2a).If the expression for x, is substituted into the objective function, the new equivalent objective function in terms of a single variable x" is

f ( x r ) : 1 4 x | - 3 6 x r + 3 6 (8.4) The constraint in the original problem has been eliminated, and /(xr) is an unconstrained function with one degree of freedom (one independent variable).

We can now minimize the objective function (8.4), say by setting the first derivative equal to zero, and solving for the optimum value of x,

u # , ' :28x2- 36 : o xi :1.286

(4)

NONLINEAR PROGRAMMING WITH CONSTRAINTS 301

Once xj is obtained, then, xf can be directly obtained via the constraint (8.2b):

6 - 3 x !

: r.071

The geometric interpretation of the above problem requires visualizing the objective function as the surface of a paraboloid in three-dimensional space. See Fig. 8.1. The projection of the intersection of the paraboloid and the plane rep- resenting the constraint onto the f (xr) - x, plane is a parabola. We then find the minimum of the resulting parubola. The elimination procedure described above is tantamount to projecting the intersection locus onto the x, axis. The intersection locus could also be projected onto the x, axis (by elimination of xr).

Would you obtain the same result for x* as before?

In problems in which there are n variables and m equality constraints, we could attempt to eliminate m yariables by direct substitution. If all the equality constraints can be removed, and there are no inequality constraints, the objective function can then be differentiated with respect to each of the remaining (n-m) variables and the derivatives set equal to zero. Or, a computer code for unconstrained optimization can be employed to obtain x*. If the objective function is convex (as in the example above) and the constraints form a convex region, then there is only one stationary point, which is the global minimum.

"f(x)

f(x) f (xr)

Contours of /(x) projected

onto the Xt - xz plane

Figure 8.1 Graphical representation of a function of two variables reduced to a function of one .ariable by substitution for one variable in the constraint.

(5)

302 NoNLTNEAR pRocRAMMTNG wrrH coNsrRArNTs

Unfortunately, very few problems in practice assume this simple form or even permit the elimination of all equality constraints.

Consequently, in this chapter we will discuss the four major approaches for solving nonlinear programming problems with constraints:

l. Lagrange multiplier methods 2. ltelgtive linearization methods

3. Iterative quadratic programming methods 4. Penalty function methods

As we will show in this chapter, the last three methods are better for general problems than the classical method based on Lagrange multipliers.

8.1 THE LAGRANGE MULTIPLIER METHOD

Let us now consider an optimization problem involving two variables and a single equality constraint. The statement of the problem is as follows:

i.e.,

Minimize f (xr xz)

Subject to h(xr, xr) : e (e is a constant)

S u b j e c t t o h ( x r , x r ) : h - e : 0

(8.5) (8.5a)

(8.5b)

(8.6)

(8.7)

(8.8a) (8.8b) At the constrained optimum the following two necessary conditions must hold simultaneouslv

df : o: #0,, + ffa,,

( t h : o : * o * , + { a r ,

oxt - 0\z

If f(xr,xr) were an unconstrained function, both partial derivatives of /(x) would have to be zero (that is, 0f l1xr: 0 and 0f l)xr:0 for all dx, and dxr) at the optimum. However, the variables x, and x2 ate constrained (dx, and dx, are not independent), so we cannot arbitrarily set both partial derivatives of /(x) equal to zero. However, /(x) must be an extremum at the optimum, hence 4f 8) :0. The second condition, namely dh(x) :0, exists because the constraint h(xr, xr) was assumed to be equal to a constant.

In Eqs. (8.6) and (8.7), we can envision that dx, and dx, are differential perturbations or variations from the optimum point xf and x\, and rewrite the equations in the form of two equations in two unknown variables:

a ' d x , - f a , d x r : Q ay dx, -f a22 dxr: Q

(6)

8.I THE LAGRANGE MULTIPLIER METHOD 303

Because the right-hand sides of both equations are 0, either a trivial solution (dx, : dxr: g's or a nontrivial (nonunique) solution can be chosen (refer to Appendix B). In order for a nontrivial solution to exist, the determinant of the coefficient

.matrix (a,r) must be 0 in (8.8), or arrazz - ezratz: 0. Expressing this condition in another way:

Therefore, in general

Q t t _ Q t 2 Q z t Q z z

a t t Q z t

Q t z a z z (8.e)

( 8 . 1 0 )

( 8 . 1 1 a )

( 8 . 1 1 b )

( 8 . 1 2 )

a 7 r : - 0 ) a 2 . , a n d e r 2 : - e ) a 2 2

where ar is a parameter which relates the partial derivative of /(x) and /r(x); crr is commonly referred to as a Lagrange muhiplier. If an augmenied objective function, called the Lagrangian, is defined as

L(x, ot): ,f(x) + ah(x) [note /r1x; : 61

then Eqs. (8.10) can be written in the following way (for two variables x, and x z )

df ah aL

^ I t t l : - : ; - : 0 o x t o x t d x r

af ah aL

^ * t l

^ - : ^ - : 0

oxz oxz dxz

In the conditions (8.11) describing the optimum, we have not explicitly included the effect of the constraint. Therefore, in addition to the necessary conditions for a constrained optimum the following equation must hold:

)Lk-cn\

-#: ft(x): o

In summary, the procedure of using Lagrange multipliers to locate an optimum point requires that an additional variable c,; be difined as well as a Lagrange function. one of the necessary conditions for an extremum of /(x) is that the partial derivatives of t(x) with respect to x1, x2, and <^tr must vanish (rather than those solely with respect to the original two variables). Therefore, we have increased the dimensionality of the optimization problem by one corresponding to one equality constraint, but we have eliminated the constraint explicitly. In addition, if the constraint h(x):0 is satisfied, then the values of /(x) and L(x, a) at the constrained optimum are the same.

(7)

304 NoNLINEAR pRocRAMMING wITH coNsrRAINTs

EXAMPLE 8.1 USE OF LAGRANGE MULTIPLIERS We want to

Minimize f(x):4x! + 5xl Subject to h(x) : 0 :2xr * 3xt - 6

Sobrton. Let

L(x' a) : 4x! + 5xl + a(2x' I 3x2 - 6) Apply the necessary conditions (8.11) and (8.12)

AL$,a)

(a)

(b)

oxt 01,(x, a)

0x, 01,(x, a)

: 8 x r * 2a:0

: l0xz * 3ar: 0

: 2 x r * 3 x 2 - 6 : 0

(c)

(d)

(e)

0a (f)

By substitution, xr : - al4 and xz : -3all0, and therefore Eq. ("f) becomes 2 ( a f i ) + 3 ( - 3 a l r 0 ) - 6 : 0

a : - 3 0 1 7 xf : r.011 xl : t.286

We must check to see x* is a minimum and not a stationary point (or maximum) because the Lagrangian function itself exhibits a saddle point with respect to x and ar at the optimum (often referred to as a "minimax" problem;

see Bazaraa and Shetty, 1979, sec. 6.2). The sufficient conditions to check for a minimum are fairly involved and are listed in Sec. 8.2. Because in Example 8.1 the three equations arising from the necessary conditions were simple linear equations, we did not have to resort to solving a system of three nonlinear equations. However, if the character of the objective function and constraints was such that (d), (e), and (/) in Example 8.1 were nonlinear equations, then the method of Lagrange multipliers is not particularly useful because the solution of a set of nonlinear equations is as difrcult, or more so, than the direct solution of the original optimization problem by one of the methods described in Secs. 8.4 and 8.6 below.

The theory of Lagrange multipliers can readily be extended to constrained optimization problems involving more than two variables. In general, if the objective function contains n decision variables, and there are m equality con-

(8)

THE LAGRANGE MULTIPLIER METHOD 305

straints involving the decision variables (the m constraints must be independent and m must be less than n), then the Lagrange multiplier formulation of an optimization problem requires the use of m Lagrange multipliers. The final unconstrained augmented objective function, the Lagrangian, will be a function of n * m vaiables, and n + m necessary conditions corresponding to each of these n I m variables must be derived and solved in order to obtain a solution at the constrained optimum.

The general nonlinear programming problem can be converted to a problem that contains only equality constraints as follows:

Minimize f (x)

S u b j e c t t o h l r ) : 0 j : 1 , . . . , w g i ( : r ) - " ? : O i : m + 1 , " ' , P

( 8 . 1 3 )

By substracting the square of a slack variable, oj, from Slx), j:m* 1,...,p, we can guarantee that the inequality constraints are satisfied but convert them to equality constraints. Note that by use of a slack variable, or?, a positive value for any oi, we avoid adding any inequality constraints o, > 0 to the nonlinear programming problem as is done in linear programming. Then, we can define the Lagrange function

L(x, a):

"f(x) + a,ht(x)+ I a[o/x)-oj)

j : m + 7

(8.14)

where the o)i, i: 1,..., p are weighting factors independent of x identifiable as Lagrange multipliers. The necessary and sufficient conditions (Dennis, 1959) for x* to be a solution of the nonlinear programming problem (S.13) are that /(x*) be convex, that the constraint set be convex in the vicinity of x*, and that the following set of equations yielding a stationary solution of (8.14) be satisfied at

Xtl'

m

T/r

0L(x*\

+ : 0

oxi

0L(x*\

- - : : : 0

d@i

W : 2 a i o : : o f o r j : m * r , . . . , p

f o r i : 1 , . . . , f l

f o r i : l , . . . , p

( 8 . 1 5 )

@ ; < 0 i - I,..., P

For a maximum of /(x), replace oj10 with @;>0. If the set of equations (8.15) is a nonlinear set, it is preferable to select one of the other optimization methods discussed in subsequent sections.

(9)

306 NoNLTNEAR pRocRAMMTNG wrrH coNsrRArNTs

EXAMPLE 8.2 APPLICATION OF THE LAGRANGE MULTIPLIER METHOD WITH NONLINEAR INEQUALITY CONSTRAINTS Solve the problem

Minimize f (x) -- xrx,

Subject to s(x):25 - xl - x/> 0 by the Lagrange multiplier method.

Solution. The Lagrange function is

L(x, a) : xtxz + a(25 - X" - xZ - ot) (b) The necessary conditions for a stationary point are

A L

f u r : t t - 2 a x t : g AL

0 * r : x ' - 2 a x t : g

^ t ^ ( c )

7 : z s - x l - x j - o l : o

aa AL

i l o r : 2 < o o t : g

The simultaneous solutions of Eqs. (c) for rl:0 and for a * 0 are listed in Table E8.2. How would you calculate these values?

Columns two and three of Table E8.2 list the components of x* that are the stationary solutions of problem (a). Note that the solutions for a-r, < 0 are minima, those for or ) 0 are maxima, and rr;, : 0 is a saddle point of problem (a). Figure E8.2 illustrates the functions in problem (a). The contours of the objective function (hyperbolas) are represented by broken lines, and the feasible region is bounded by the shaded area enclosed by the circle S(x):0. Points B and C correspond to the two minima, D and E to the two maxima, and A to the saddle point of /(x).

Table E8.2 Solutions of problem (a) by the Lagrange multiplier method

(D x1 x2 Point o I f (x) Remarks

0 0 0 A 5 0 S a d d l e

_o s t+354 I-3.s4 B 0 -12.s Minimum

t - 3 . 5 4 l + : . s + C 0 _ 1 2 . 5 Minimum o s !+l's+ J+l'sl D o +12.s Maximum

t-3.54 l-3.54 E o +12.5 Maximum

(a)

(10)

AL Li

II

8.1 THE LAGRANGE MULTIPLIER METHOD 307

Figure E8.2

Lagrange multipliers have an important special interpretation in optimization as discussed in Sec. 7.8. The Lagrange multiplier for a given constraint indicates how much the objective function will change for a differential change in the constraint constant [the parametet e in Eq. (8.5c)]. Look again at Eq.

(8.54). We would like to determine how much the value of the objective function at the optimum will change as e varies. Form the partial derivative of the Lagrangian with respect to the parameter e

+: +Lr(x) + a(h1xs - e)): -a

0e de -'

Equation (8.16) can be approximated using difference notation

( 8 . 1 6 )

- a ) L , L : - a A , e

Therefore, a small change in the constant e results in a change in r, that is proportional to the Lagrange multiplier. Since, as previously stated, the Lagran-

gsan L is equal to / when the constraint is satisfiid (fr : 0), then the Lagrange

? s - r ? - x t r : O

7 / /

zis ,'

z ' / |

'r/ .l

,/' /

/ / , v /

/ t./

/ ,/

"r"rir"'k,^\^{\ \ Y} ¹ ^'o-</a

r ,\1t \.1\..

\..\ ---

.ll'i:\r, -.

(: tr '\ -' ) , \ \

1 2 3

' v./' / /

/ / '/ /

,t ?4 t/

/ 2 7

(11)

308 NoNLTNEAR

multiplier can be interpreted as giving the change in / (as well as L) for a change in the constanl e at the optimal solution.

If we designate b_y co* the value of the Lagrange multiplier at x*, the optimal solution, where [(t*): e and "f(**): L(x*, ot*), you can see that the values of x* and @x are explicit functions of e, which is interpreted as the quantity of a limiting resource. In economics and operations research, Lagrange multipliers are termed shadow prices of the constraints because the change in the optimal value of the objective function per unit increase in the right-hand side of the equality constraint is given by ro*. The value of /(x) increases or decreases from /(x*) with an increase or decrease in e depending on the sign of co*.

EXAMPLE 8.3 SENSITIVITY ANALYSIS USING LAGRANGE MULTIPLIERS

Consider the following general quadratic objective function with a linear constraint:

Minimize f (*): crxl + crx|

Subject to arxr + azx2: e

What is the change in /(x) with respect to e at the stationary point?

Solation. The Lagrangian is

L(x, a) : crx? + crxf + a(arx, * a2x" - e) Some of the necessary conditions are

(a)

(b)

(c)

AL

^ : 0 :2c,x, o x r

AL

- a , a *

* a r a x l : c .

- e l

- : 0 :2crx, * arco xt : 0xr.

AL

^ : 0 : a 1 x 1 *azx2-e da

These expressions lend to the following relations -a?a* aZa*

n : _ ' _ L _ s

- a ^ a -

L L 1 L v 2

- a z @ *

a ^a L 2

o ) * :

(-all2cr) - (a!l2cr)

a t €

2c rl(a2, I 2c r) + (a/l2c r)l

a z €

x\

(12)

8.2 NECESSARY AND SUFFICIENT CONDITIONS FOR A LOCAL MINIMUM 309

At x* with alf + a2xt - e : o

/(x*) : L(x*, co*) : l(ale2 I 4c r) + (ale2 I 4c )f

: - A a

(d)

(e)

l(all2c,) + (atrl2c,)l

The sensitivity of /(x*) with respect to e can be found by differentiation:

el(all2cr) + (a|l2cr)f e l(ail2c,) + (a;lzc)l : 6 X r c , )

+ ( a i l k ) af @*)

Ae the expected result.

The Lagrange method is quite helpful in analyzing parameter sensitivities in problems with multiple constraints. In a typical refinery, a number of different products are manufactured; these products usually must meet (or exceed) certain specifications in terms of purity as required by the customers. Suppose we carry out a constrained optimization for an objective function that includes the many variables which occur in the refinery, including variables in the fluid catalytic cracker, in the distillation column, and so on, and arrive at some economic optimum subject to the constraints on product purity. With the optimum values of the variables plus Lagrange multipliers corresponding to the product purity, we could then pose the question: How will the profits change if the product specification is either relaxed or made more stringent? To answer this question simply requires examining the Lagrange multiplier for each constraint. As an example, consider the case in which there are three major products (A, B, and C) and the Lagrange multipliers corresponding to each of the three demand constraints were calculated to be:

coz : -0'001

@ n : - 1 ' 0

@ c : - 0 ' 0 0 7

These values for cr-r, show (ignoring scaling) that profit of the refinery is much more sensitive to the production of product B than for the other two products.

8.2 NECESSARY AND SUFFICIENT CONDITIONS FOR A LOCAL MINIMUM

In this section we list the necessary and sufficient conditions for x* to be a local minimum of the general constrained nonlinear programming problem

Minimize

"f(*) Subject to hlx) :0

x : [xr x2 xnf' j : 1,...,m

g t ( x ) > 0 i : m - t 1 , . - . , P

( 8 . 1 7 )

(13)

310 NoNLTNEAR pRoGRAMMTNG wrrH coNsrRArNTs

The evolution of the concepts underlying these conditions (stated below) and the proofs related to the conditions are fairly involved and can be found in Gill, Murray, and Wright (1981), Sec. 3.4, and Ben-Israel, Ben-Tal, and Zlobec ( 1 9 8 1 ) , C h a p . 5 .

Recall from Sec. 8.1 that a Lagrange function can be formed including the objective function and equality constraints. Here we include the inequality constraints with additional multipliers

L(x, to, u) : /(x) +

p

a,hi(x)- I ujgj(x)

j = m + |

( 8 . 1 8 )

(If the inequality constraints had been expressed as gfx) < 0 in Eq. (8.17), then we would add terms involving the product of the Lagrange multipliers u, and gi$)in the right-hand sum instead of subtract.) The co, can be positive or negative but the a, must be > 0. For proof, see Bazaraa and Shetty (1979 Chap. 4).

Table 8.1 lists one form of the necessary and sufficient conditions. You can change the original problem (8.1) or (8.17) into amaximization problem by letting the new objective function be -/(x).

The first-order necessary conditions, conditions (a), (c), (d), (e), (f), and (g) in Table 8.1, are the well-known Kuhn-Tucker conditions for optimality.

These conditions serve as the basis for the design of some algorithms and as termination criteria for others. Recall from Sec. 6.2 that a valid search direction must satisfy the condition Vr/(x)s < 0. Any small movement along a search direction s for an NLP problem with constraints must also satisfy the constraints. Examine Fig. 8.2, which illustrates the concepts involved in two dimensions. At x*, the true minimum, - V/ (x*) can be expressed as a nonnegative (that is, uI > 0) linear combination of the gradients of the active constraints at x*, or

-V"f(**) -- ufl-Ys(x*)l + u!l-Ysr(x*)l

All feasible directions of search lie in the cone comprised of the negative gradients of the active constraints at x*.

Why this statement is so is illustrated by Figs. 8.2a and 8.2b. A smdll move in a direction s making an angle less than 90" with -V"f(x*) will decrease /(x) so that at a valid x* no feasible search direction can have an angle of less than 90'from -V"f(**).Suppose that -V/(x*) lay above -Ygr(x*) as in Fig.8.2b.

Then -V/(x*) will make an angle of less than 90" with respect to gr(x) and hence allow a feasible move. Similarly, if -V/(x*) lies below -Vgr(x*), a feasible move (at least differentially) is possible with respect to gr(x). Because neither outcome should occur at an optimal point, a necessary condition is that -V"f(x*) is restricted to fall within the cone generated by the negative gradients of the active constraints.

m

\.L

(14)

8.2 NECESSARY AND SUFFICIENT CONDITIONS FOR A LOCAL MINIMUM 3I1

Table 8.1 The necessarv and sufficient conditions for x* to be a local minimum of (8.17)

The necessary conditions for x* to be a local minimum of /(x) are:

(a) f (:x), ht(x), o t$) are all twice differentiable at x+

(b) The so-called "second-order constraint qualification" holds (it contains information about the curvature of the constraints that is taken into account at x* as explained in Example 8.4 below); the sufficient conditions for this requirement are that the gradients of the binding constraints (g;(**):0), Vgr(x*), and the equality constraints, Vht(x*), because h;(x*):0, are linearly independent (refer to Fig. E8.5 for an illustration).

(c) The Lagrange multipliers exist; they do if (b) holds (d) The constraints are satisfied at x*

( t ) t t r ( x * ) : 0 (2) g;(x*) > o

(e) The Lagrange multipliers ar* (at x*) for the inequality constraints are not negative (<o, can be positive or negative)

u i * ) 0

(/) The binding (active) inequality constraints are zero; the inactive inequality constraints are >0, and the associated ur's are 0 at x*

u i * g j(x*) :0

(g) The Lagrangian function is at a stationary point Vl*(x*, ro*, ux) : Q

(h) The Hessian matrix of L is positiue semidefinite for those v's for which vrVg,(x*) : 0, and vrVft;(x*) :0, that is, for all the active constraints

vlv27L1y*, <o*, u*)lv > 0

The suficient conditions for x* to be a local minimum are:

(a) The ne@ssary conditions (a), (b) by implication, (c), (d), (e), (f), and (g) (b) Plus a modification of necessary condition (ft):

The Hessian matrix of L is positiue definite for these vectors v such that v r v a , ( x * ) : 0 )' ( f ^ r r h e binding constraints

v r V h , l x * ; : g f

vrVg,(x*) > 0 for the inactive constraints v t V ' 1 L 1 x * , r o * , u * ) l v > 0

When stated in algebraic terms, the above concepts are:

ro inc,ude a,, the,""*;,; .""il:::;1fi:tr-#11" irs,(xx) > 0,

hence complementarity condition (/) arises, namely u f > 0 i f 9 ; ( x * ) : 0 uf :0 if e;(x*) > 0

-V"f(x*) : I rft-Vsl**)l

(15)

/ ( x ) : c o n s t 9 r ( x ) : 0 x2

312 NoNLTNEAR pRocRAMMTNG wrrH coNsrRArNTS

sz(x) : O

Feasible regron. x

Svgr(x') ,, glx) > 0

/ r'ult

)^..

vsr(x*) -vgr(x*)l

Tangent to gr(x)

-vl(x*)

\./

uf [-Vg,(x*)]

.'-f(x) : const

sz(x) : 0 Or(x) : o

Figure E2a Interpretation of the Lagrange multipliers as weighting factors of the gradients of the active inequality constraints 9(x) :0. The cone generated by -Vg(x*) contains -V/(x*)

and the product ufOt$*):0 for all the inequality constraints. Condition (g) arises because

rVL(x*, u*) : V/(x*) -l utYst(x*)

and the terms on the right-hand side are equal to each other at the optimum.

Condition (b) must be satisfied because of certain rare cases such as in- stances in which the gradients of the constraints fall along the same line in the same direction or in the opposite direction. To properly chatacterize the topog- raphy of a local minimum, it is necessary to take into account the curvature of the constraints at the presumed x*. If no reduction in /(x) can be obtained along a smooth arc extending from x* into the feasible region (i.e., a point on the arc satisfies all constraints), then x* is a local minimum of /(x). To demonstrate that feasible perturbations from x* do not reduce /(x), the constraint functions must meet certain theoretical conditions called constraint qualifications. Several ways exist in which these constraint qualifications can be specified, but the most practical is condition (b) in Table 8.1. In effect (b) means that the matrix whose rows are the gradients of the active constraints, namely the Jaco- bian matrix of the linearized constraints at x*. is of full rank.

(16)

O r ( x ) : 0

8.2 NEcESsARy AND suFFlcrENT coNDrrloNs loR A LOCAL MINIMUM 313

gz(x) : 0

Tangent to gr(x)

, v/(x*)

-f(x) : const s Feasible

r a o i o n n / v ) > O

'-Ygr(x*) -vg,(x*)

f(x) decreases

4

9 z ( x ) : o

Figure 8.26 Interpretation of the Lagrange multipliers as weighting factors of the gradients of the active inequality constraints Sdx):0. Case when -Vl(x) is not orthogonal to s, and lies above

-Ys,6).

Condition (h) means that the Lagrange function is convex with respect to those tangents to the active constraints at x*. Example 8.4 illustrates the signifi- cance of condition (h).

If you can demonstrate that the sufficient conditions are satisfied, then x*

is guaranteed to be a local minimum. Of course, a local minimum can exist even if the sufficient conditions are not met so that the direct use of the sufficient conditions as the strategy to search for a minimum rarely proves effective.

A special case of a nonlinear programming problem known as a conuex programming problem exists as follows:

Minimize

"f(x) /(x) is convex

Subject to h,(x) :0 j : 1,. . .,ffi h1(x) arclinear S ; ( x ) 2 0 j : m * 1,...,? g , ( x ) a r e c o n c a v e x , ) 0

For these conditions the constraints form a convex set and it can be shown that the local minimum is also the global minimum (Avriel, 1976, Sec. 4.5). Analogously

(17)

314 NoNLTNEAR pRocRAMMTNG wrrq coNsrRArNTs

a local maximum is also the global maximum if the objective function is concave and the constraints form a convex set.

EXAMPLE 8.4 TEST FOR NECESSARY AND SUFFICIENT CONDITIONS

Fiacco and McCormick (1968) used the following problem to illustrate the neces- sity of considering the curvature of the constraints in determining if x* is a local minimum. Examine Fig. E8.4. The problem is to

Minimize f(x): (x, - l)z + xj

' n t ( x ) : - x , * * ' o Subject to

{

either (a)

[ o . @ ) s;c;)- -xl + xj > o

I I I -9-o

/ .-7"---.r_ I a

,t7- -.-. \\ tt

7rt t.. \, \,

/ \ \ . \

w 2

Q r : - x t * ; . 0

' . - - -

- - . . - \ -

' x 2

- - n r : - x t + f > o

lv<,> '\

\ \

I x t

\ . i - - ' l ( x ) : ( x ' - 1)'+

vsl(x) i vgl*)

\

xl

Figure E8.4

(18)

8.2 NECESSARY AND SUFFICIENT CONDITIONS FOR A LOCAL MINIMUM 315

To illustrate how the necessary conditions might be used to find a possible solution to the above problem, we will find a solution when constraint (a) applies.

To determine x* we set V*L : 0

2 x r - 2 + u : O ( b )

2 x , - iuxt:O ( c )

Is a:0 or is u > 0? If u:0, xr: 1 and x2 :0 (this point is the unconstrained optimum). However gr(x) is not satisfied because

- 1 + 0*0

hence u > 0 and gr(x) : 0. Thus xr: (xll4) and u :2 - 2x, from Eq. (b). Intro- duction of these two relations into Eq. (c) yields xz:0, xr :0, u:2,or xz: -t2, xr: l, u : 0, but the latter triple is invalid (u must be >0), so that x* : [0 0]' and il* :2. Inspection of the contours of /(x) indicates x*: [0 0]r is a local minimum of /(x) subject to gr(x). Check to see if V2L is positive definite.

However, if gr(x) is substituted for gr(x), the situation changes drastically. A smooth arc exists from (0,0) into the feasible region and /(x) can be reduced, yet (0, 0) meets conditions (a), (b), (c), (d), (e), (f), and (g) in Table 8.1 as can be seen as follows:

(a) "f(x) and 9(x) are twice differentiable at (0,0) (b) Only one constraint exists so (b) is satisfied (c) Hence u exists

Ql) srQ,0) is satisfied

@ ) L ( x , u ) : ( x , - l ) ' + x ] - u ( - x , + x l )

hence at (0, 0), 2(0 - l) + u: 0, or A : 2 (A is a presumed optimal multiplier) ( e ) r t : 2 > 0

(f) 'hst9, o) : o because sr(O, o) : o

However, V'?L(O,0) is not positive semidefinite

, _" :lr(';;',] -, (a)

[, ] : i:l

Y L(x, u) - l'l; "-_?;l : t;]

V"L(x,u): f2

lO

v'zt(0,0, ,:13

The vector v that satisfies vrvg10,0) : 0 is f - 1 1

l u , u r l l n l:0 o r

L " t

0 - l

2(r - u)l

0l

_ ) l

')

- u 1 * ( 0 ) u r : Q

(19)

316 NoNLINEAR pRocRAMMTNG wrrH coNsrRArNTs

hence u, can be any value and ur:0. Thus v is a vector dt xr:0 along the x, axis, and

ro ,,13 _lt;l : _2D;+o

Let us now again consider minimizing /(x) subject to constraint gr(x) but with x* is located at (+, X.,,4).If we set VrG, X.rE): 0, we find

2 G - D + r , : 0 o r u * : l Thus

v'rG, xrT,t): f: :l_{' 1 0 0 1}

which also is not positive definite. However, when we calculate the vector vtvglx*; : 6

lu, ,rJl t^-l

: t or -u1 -r uh'ur: o L r J 2 )

and we conclude ur: u2.u, so that ,r :7uD.u2 uz].

Then

f) otf a

r,fru, ",rl; ;lLu :,,' l: 4u1> o

so that condition (h) is satisfied and the sufficient condition (b') is also satisfied.

EXAMPLE 8.5 A TEST FOR A MINIMUM

Determine if the potential minimum x'r' : [1.000 4.900]r is indeed a local minimum of the problem

Minimize f(t):4x, - x| - 12 S u b j e c t t o h r ( x ) :2 5 - x2, - x3:0

g{x) :10x, - xl + l}x, - x2, - z+ > + gz!): @t - 3)2 r (xz - l)t > 0 g r ( x ) : x t > 2

g + ( x ) : xz>. O

Figure E8.5 shows the contours of these functions on the x, - x, plane.

Solution. Test each of the necessary and sufficient conditions:

1. That the functions are twice differentiable at x* can be seen by inspection.

(20)

8.2 NECESSARY AND SUFFICIENT CONDITIONS FOR A LOCAL MINIMUM 3I7

-2.O

Figure E8.5 Contours of/(x), ft,(x) :0, and 9r(x) : 0.

2. Are the constraints satisfied?

h t t 2 5 - l - ( 4 . 9 ) 2 = 0

2 s - r - 2 4 . 0 : 0

g i 1 0 - 1 + 4 9 - 2 4 . 0 - 3 4 > 0 y e s , g r : 0 a n d i s b i n d i n g 0 z : ( 1 - 3 ) ' + ( 4 . 9 - l ) t > 0 ! e s , g z ) 0 a n d i s n o t a c t i v e g z > 0,9r> O

Hence the constraints are satisfied within reasonable precision.

3. Does the second-order constraint qualification hold? check for linear indepen- d e n c e o f Y h , a n d Y g r :

f ( 1 0 - 2 x t ) l f - 2 x 1 1

''L(

'o - z*ij l* "1 - r,l_l : o

t 8 t f - 2 1

',Lo.rJ *',1-r.rl : o

A solution to the set of 2 homogeneous equations exists only if the determinant of the coefficients is zero. Does the d.r[: ;'--l : 0? No, hence c, : cz : 0 is the onlv sotution of the homog"r."lr'i;,11J a""*o"ently the gradients of the active constraints are linearly independent and the first- and second-order constraint qualifications are satisfied at x*.

4.0 6.0

l v \

vftr(1,4.9) \

(21)

NONLINEAR PROGRAMMING WITH CONSTRAINTS

4. Are a' the zrerx*) :"i::::;;"":;"",,"

er(x*) : 0

a! must be zsro because gz(x*) > 0 af must be zero because gr(x*) > 0 uf must bezero because g+(x*) > 0 5. Is VL(x*, or*, u * ) : 0 ?

l-1.,). 'rl-ii|

1 4 8

| -9.8 0.2 o , I : l ? R

t -| 9.8 0.2 + 7 7 . 2

-9.8 - 9.8arf - 0.2u1:g We use Cramer's rule to solve for alf and af

I l-10 - 2x!1 ,

-J -

't[ro - r.t)- o: o

4 - 2 a ! - 8 u f - 9

1 2 4

| 9.8 -9.8 ||

u i : l

- |

l " " I

| 9.8 0.2 | - 58.8 -76.0: 0 . 7 5 4

VL is 0 for cof : - 1.016 and rrf :0.754. Note r, > 0 is satisfied; ar, can tre positive or negative.

6. Is the Hessian matrix of L(x*, or*, u*) positive definite for those v satisfying vrvglx*; : 0 and vrvhlx*; : 97

a2L

a ? ' : - 2 a t + 2 u '

* : - , - 2 a , i 2 u , oxi

The cross partial derivatives vanish.

v2Lk*.or*. u*) - [t( -2X- 1.016) + 2(0.7s4)1 0

I L 0 l - 2 - ( 2 X - 1 . 0 1 6 ) + 2 ( 0 . 7 5 4 ) f )

_ [3.54 0 I L 0 r.54)

so that V2tr is positive definite at x*.

However, if you try to find the vector(s) v of

- , f r 0 - 2 ( l ) I ^

[u, ,rJlro _ a+soolJ : o

- - f - 2 ( 1 ) I

[u, ,,tl_a+ioolJ: o

(22)

8.3 QUADRATIC PROGRAMMING 3I9

8 u , * 0 . 2 u r : 0 - 2 u , - 9.8ur:9

the only vector satisfying the two equations is v: [0 0]t.

Consequently, we can show that

vrY2L(x*,ro*, u*)v ) 0 (necessary condition) but not

vrY2L(x*,or*, u*)v > 0 (sufficient condition)

The difficulty here is that the original problem has two variables and two binding independent constraints at x* so that in truth the problem is one of solving two simultaneous equations; zero degrees of freedom exist for such an optimization problem.

You can see that the computations needed to evaluate the sufficient conditions might be quite extensive for large-scale problems. Furthermore, because the results may be inconclusive, you may not be able to show that the sufficient conditions are satisfied, but on the other hand you cannot show that they are not satisfied. Most computer codes do not carry out a test for sufficiency once x* has been computed.

8.3 QUADRATIC PROGRAMMING

Quadratic programming (QP) is the name given to the procedure that mini-

mizes a quadratic function of n variables subject to m linear inequality or equality, or both types, of constraints. A quadratic programming problem is the simplest form of a nonlinear programming problem with inequality constraints.

In this text we discuss QP as a subproblem to solve general nonlinear programming problems. In addition, a number of practical optimization problems are naturally posed as QP problems, such as constrained least squares, optimal con- trol of linear systems with quadratic cost functions, and the solution of algebraic and differential equations. The techniques proposed for the solution of a quadratic programming problem bear many similarities to those used in solving the linear programming problems discussed in chap. 7. Specifically, each inequality constraint must either be satisfied as an equality (i.e., is binding) or it is not involved in the solution of the QP problem, so that once the binding constraints are identified, the QP technique can reduce to a vertex-searching procedure examining the intersection of linear equations as in LP. This idea is the fundamen- tal link between the two methods.

In compact notation, the quadratic programming problem is Minimize

-f(x): crx + ]xrQx Subject to Ax > b

x > 0

(8.1e)

(23)

320 NoNLINEAR pRocRAMMING wrrH coNsrRAINTs

where c is a vector of constant coemcients, A is an (m x n) matrix, and it is generally assumed that Q is a symmetric matrix.

Because the constraints (8.19) are linear and presumably independent, the constraint qualification condition (Sect. 8.2) is always satisfied, hence the Kuhn- Tucker conditions are the necessary conditions to obtain an optimal solution of the QP problem. In addition, if Q is positive definite, the Kuhn-Tucker conditions are also the sufficient conditions for an extremum, and a solution meeting these conditions will yield the global optimum. If Q is not positive definite, the problem may have unbounded solutions and local minima. For (8.19) the kuhn-Tuckei conditions can be written down directly from conditions (a), (c), (e), (f), and (s) in Table 8.1.

We start with the Lagrange function

L: xrc + lxrQx - vrlAx - b) - u?x

and equate the gradient of L (with respect to xr) to zero (note vr(Ax - b):

(Ax - b)rv : (ttAt - bt)" and urx : xru) Y , , L : c + Q x - A r v - u : 0

The nonnegative slack variables y can be inserted into the constraints Ax - b >

0 s o t h a t A x - b - Y : 0 o r y : A x - b a s e x p l a i n e d i n S e c ' 7 ' 4 ' T h e n t h e Kuhn-Tucker conditions reduce to the following set of linear equations:

c * Q x - A r v - u : 0 A x - b - Y : 0

x > 0 u > 0 v > 0 Y > 0 u r x : 0 v t y : 0

(or urx * vrY : g;

where the u, and u, are the Lagrange multipliers and the li are the slack variables. The iet of variables (X*o u*, v*, y*) that satisfies (8.20) to (8.24) is the optimal solution to (8.19).

It can be shown in a straightforward way (cottle, 1968) that if (x, u, v, y) is the solution of the QP problem (8.19), then /(x) in (8.19) is equivalent to

7 : ! @ r x + b t y ) (8.2s)

Minimization of function (8.25), subject to (8.20) to (8.24), form what is called the linear complementarity problem (Eaves, l97L), the solution of which is the solution of the original QP problem (8.19). The term complementarity aises from the idea that in (8.23) if one of the "complementary pait" (ur, x,) is zero, then the "complementary condition" (8.23) is satisfied. Note that with the excep- tion of Eq. (8.23), the Kuhn-Tucker conditions are nothing more than constraints for a linear programming problem involving 2(n + m) variables. All the complementary restriction says is that it is not permissible for both x, and ut to be basic variables in a basic feasible solution'

(8.20) (8.21) (8.22) (8.23) (8.24)

(24)

8.3 QUADRATIC PROGRAMMING 32I

A problem in which the linear constraints Ax > b are replaced with Ax:

b in (8.19) is also often called a quadratic programming problem. The Kuhn- Tucker conditions are the same as (8.20) to (8.24) except Y: 0. For a mixture of equality and inequality constraints, an element of y would have to be retained for each inequality. Also, by way of interest, if Q is at least positive semidefinite, a dual problem of (8.19) can be formulated.

Maximize r: _trzrez+brw

S u b j e c t t o Q z - A " w + p > 0 ^(8.19a) w > 0

z unrestricted

and the max F : min f at x*. For further details refer to Cottle (1963).

Pang (1983) reviewed various methods that have been used to solve quadratic programming problems, and identified four families of methods:

1. LP Simplex-related 2. Active set strategy

3. Internal representation of the feasible set 4. Shrinking ellipsoid

The first two classifications cover most of the existing methods, but as Pang has pointed out, many methods are essentially the same and therefore it makes little difference as to which method is used as long as the computer code is properly prepared. Some practitioners prefer to use the dual formulation of the QP problem because the origin is a feasible point, and a phase I procedure (finding an initial feasible point) can be avoided. Table 8.2 lists some quadratic

Table 8.2 Quadratic programming codes

Code

Name Computers Language

coNQUA

DUAL QP

QPSOL

zQPC\rX

Econometric Institute

Erasmus University, P.O. Box 1738, 3000 DR Rotterdam,

The Netherlands OCI, P.O. Box 144, Park Ridge, NJ 07656

Office of Technology Licensing, Stanford University,

Stanford. CA 94305

MJD Powell, DAMTP, Silver St., Cambridge, CB3 9EW England

DEC 2060

IBM 370, 4300, 3000 Numerous

IBM 3081

Fortran

Fortran 66

Fortran

10,000 Dutch guilders

Inquire

$1000 (academic discount exlsts) None

(25)

322 NoNLTNEAR pRocRAMMING wITH coNsrRAINTs

programming codes. In practice, you can use a general nonlinear programming code to solve quadratic programming problems reasonably efficiently without requiring a separate quadratic programming code.

We turn now to techniques that have been effective in solving the general constrained nonlinear programming problem (8.1).

8.4 THE GENERALIZED REDUCED GRADIENT METHOD

Certainly, the most direct approach to solving the general nonlinear programming problem would be to linearize the problem and successively apply linear programming techniques by such means as:

1. Formulate a model with a nominal operating point and linearize all the constraints and objective function about the operating point so that they fit the linear programming format. Use linear programming once to solve the linearized problem.

2. Successively lineafize the constraints and objective function of a nonlinear problem as successive improved feasible solutions are reached. Once the solution of the nominal LP is obtained and the nominal optimal point proves to be nonfeasible, locate the nearest (or some) feasible point, and again linearize the constraints and objective function about this new point.

3. Linearize the functions in a piecewise fashion so that a series of straightJine segments are used to approximate the objective function and constraints.

There is no guarantee of convergence for any of these methods. Refer to Sec. 8.9 for more information on this approach.

In practice the best current general algorithm (best on the basis of many tests-see Sec. 8.7) using iterative linearization is the Generalized Reduced Gra- dient algorithm (GRG). The GRG algorithm (Abadie and Carpentier, 1969;

Faure and Huard, 1965; Abadie and Guigou, 1969; Liebman, et al., 1986) is an extension of the Wolfe algorithm (Wolfe, 1962) fot linear constraints modified to accommodate both a nonlinear objective function and nonlinear constraints. In essence the method employs linear or linearized constraints, defines new variables that are normal to the constraints, and expresses the gradient (or other search direction) in terms of this normal basis. (Wolfe has shown that the original reduced gradient method is related to the Simplex method of linear programming.) Although the problem solved by the GRG method is

Minimize f(x) x: [x, x2 xof'

S u b j e c t t o h ; ( x ) : 0 j : 1,...,m L r < x , 1 ( J , i : I , . . . , n

( 8 . 1 c )

where L, and U, are the lower and upper bounds on x,, respectively, the upper and lower bounds are treated as separate vectors rather than being classified as inequality constraints because they are treated differently in determining the step lensth in a search direction.

(26)

8.4 THE GENERALIZED REDUCED GRADIENT METHOD 323

Nonlinear inequality constraints can be accommodated by subtracting (or adding, as the case may be) the square of slack variables from the inequality constraints thus:

h ; ( x ) : o ; 6 ) - o j : o

and permitting the bounds on the or's to be - oo < o j < oo. (The or's are added to the set of n variables.) An alternate method is to subtract (or add) the slack variable itself and make the slack variable nonnegative by adding bounds to the problem

h t ( x ) : oi$) - oi:0 S u b j e c t t o : o i > 0

Finally, what is called the "active constraint set" strategy might be used, that is, at each stage of the search the active inequality constraints are added and the inactive ones removed from the equality constraint set, the selection based on Lagrange multiplier estimates. Recall that in problem (8.1c), in general you cannot directly reduce the dimensionality of the problem by substitution because the nonlinear equality constraints implicitly connect the variables.

Hence the equations cannot be solved for a set of dependent variables that can be substituted into the objective function, leaving only independent variables.

However, a method of local linearization permits a reduction of dimensionality to take place.

Reduced-gradient algorithms seek to maintain feasibility on a set of the nonlinear constraints while reducing the value of the objective function. Thus, the dimensionality of the search is reduced to the total number of variables less the number of independent constraints. As you might expect, one difficulty with this procedure is to return to a feasible point if the solution of a subproblem with linearized active constraints yields a nonfeasible point with respect to the original constraints. Differences among the generalized reduced-gradient algorithms result from differences in viewpoint as to how to carry out the search, how to reduce the objective function, and how to return to a feasible point.

8.4.1 Concept of the Reduced Gradient

Suppose that the lrl constraints in problem (8.1a) are linear or are linearized approximates to hlx) : 0. Imposition of these constraints reduces the number of degrees of freedom associated with x from n to (n-m). In carrying out the search for the solution of problem (8.1a), only (n-m) vaiables will be independent variables; m variables will be dependent ones. This reduction in dimensionality is described more formally in Appendix C in terms of two subspaces, but here we just use a simple example involving two variables to explain the concept of the reduced gradient.

Consider the following problem:

Minimize f(xrxz) Subject to h(xr, x) : 0

(8.26)