# Full text of "System Modeling and Optimization XX [electronic resource] : IFIP TC7 20th Conference on System Modeling and Optimization July 23-27, 2001, Trier, Germany"

## See other formats

Edited by E.W. Sachs R. Tichatschke SPRINGER SCIENCE+ BUSINESS MEDIA, LLC SYSTEM MODELING AND OPTIMIZATION XX IFIP - The International Federation for Information Processing IFIP was founded in 1960 under the auspiees of UNESCO, following the First World Computer Congress held in Paris the previous year. An umbrella organization for societies working in information processing, IFIP's aim is two-fold: to support information processing within its member countries and to encourage technology transfer to developing nations. As its mission statement clearly states, IFIP's mission is to be the leading, truly international, apolitical organization which encourages and assists in the development, exploitation and application of information ’technology for the benefit of all people. IFIP is a non-profitmaking organization, run almost solely by 2500 volunteers. It operates through a number of technical committees, which organize events and publications. IFIP's events range from an international congress to local seminars, but the most important are: • The IFIP World Computer Congress, held every second year; • Open conferences; • Working conferences. The flagship event is the IFIP World Computer Congress, at which both invited and contributed papers are presented. Contributed papers are rigorously refereed and the rejection rate is high. As with the Congress, participation in the open conferences is open to all and papers may be invited or submitted. Again, submitted papers are stringently refereed. The working conferences are structured differently. They are usually run by a working group and attendance is small and by invitation only. Their purpose is to create an atmosphere conducive to innovation and development. Refereeing is less rigorous and papers are subjected to extensive group discussion. Publications arising from IFIP events vary. The papers presented at the IFIP World Computer Congress and at open conferences are published as conference proceedings, while the results of the working conferences are often published as collections of selected and edited papers. Any national society whose primary activity is in information may apply to become a full member of IFIP, although full membership is restricted to one society per country. Full members are entitled to vote at the annual General Assembly, National societies preferring a less committed involvement may apply for associate or corresponding membership. Associate members enjoy the same benefits as full members, but without voting rights. Corresponding members are not represented in IFIP bodies. Affiliated membership is open to non-national societies, and individual and honorary membership schemes are also offered. SYSTEM MODELING AND OPTIMIZATION XX IFIP TC7 20*^ Conference on System Modeling and Optimization July 23-27, 2001, Trier, Germany Edited by E. W. Sachs Department of Mathematics University of Trier / Virginia Polytechnic Institute and State University Germany / USA R. Tichatschke Department of Mathematics University of Trier Germany SPRINGER SCIENCE+BUSINESS MEDIA, EEC Library of Congress Cataloging-in-Publication Data A C.I.P. Catalogue record for this book is available from the Library of Congress. System Modeling and Optimization XX Edited by E. W. Sachs and R. Tichatschke ISBN 978-1-4757-6669-1 ISBN 978-0-387-35699-0 (eBook) DOI 10.1007/978-0-387-35699-0 Copyright © 2003 by Springer Science+Business Media New York Originally published by Kluwer Academic Publishers in 2003 All rights reserved. No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photopy- ing, microfilming, recording, or otherwise, without written permission from the Pub- lisher Springer Science+Business Media, LLC with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper. Contents Foreword vii Invited Speakers On Approximate Robust Counterparts of Uncertain Semidefinite and 1 Conic Quadratic Programs Aharon Ben-Tal, Arkadi Nemirovski Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 23 for General Nonlinear Programming Nick Gould, Philippe L. Toint Some Aspects of Nonlinear Semidefinite Programming 55 Florian Jarre Implicit Filtering and Nonlinear Least Squares Problems 71 C. T. Kelley Data Mining via Support Vector Machines 91 0. L. Mangasarian Properties of Oligopolistic Market Equilibria in Linearized DC Power 113 Networks with Arbitrage and Supply Function Conjectures Jong-Shi Pang, Benjamin F. Hobbs, Christopher J. Day Risk Control and Optimization for Structural Facilities 143 Rudiger Rackwitz Probability Objectives in Stochastic Programs with Recourse 169 Rudiger Schultz Contributed Papers Parametric Sensitivity Analysis: A Case Study in Optimal Control 189 of Flight Dynamics Christof Biiskens, Kurt Chudej Solving Quadratic Multicommodity Problems through an Interior- 199 Point Algorithm Jordi Castro VI Stability and Local Growth near Bounded-Strong Optimal Controls 213 Ursula Felgenhauer Graph Isomorphism Algorithm by Perfect Matching 229 Kazuma Fukuda^ Mario Nakamori A Reduced SQP Algorithm for the Optimal Control of Semilinear 239 Parabolic Equations Roland Griesse On Numerical Problems Caused by Discontinuities in Controls 255 Christian Grofimann^ Antje Noack, Reiner Vanselow Solutions Differentiability of Parametric Optimal Control for Elliptic 271 Equations Kazimierz Malanowski Shape Optimization for Dynamic Contact Problems with Friction 287 A. Myslihski Optimal Shape Design Using Domain Transformations and Continuous 301 Sensitivity Equation Methods Lisa Stanley Adjoint Calculation Using Time-Minimal Program Reversals for 317 Multi-Processor Machines Andrea Walther, Uwe Lehmann Foreword The 20th IFIP TC7 Conference on System Modeling and Optimization took place at the University of Trier, Germany, from July 23 to 27, 2001. This volume contains selected papers written by participants of the conference, where some of the authors made invited presentations. The conference was attended by 128 participants from 28 countries and four continents. The organizers are grateful to all participants for their contribution to the success of this conference. During the five days of the meeting 10 invited and 94 contributed presentations were given. The talks were of high scientific quality and displayed the wide range of the area of system modeling and optimization. During the course of the 20 TC7 meetings held at a biannual rate, the conferences document the progress and development of this important research area. Also during this conference, one could follow important research of well established areas, but also become acquainted with new research fields in optimization. The conference was supported by the International Federation for In- formation Processing (IFIP), in particular through the Technical Com- mittee 7, which selected the conference site, and the program committee, which put together an interesting program. Their support and help is greatly appreciated. Financial and technical support for this meeting came from the host institution, the University of Trier, and the govern- ment of the home state, Rheinland-Pfalz. Furthermore, the Deutsche Forschungsgemeinschaft (DFG), Siemens AG, and GeneralColgone Re Capital generously supported this conference. Many committees, institutions and individuals contributed to the suc- cess of this conference. We thank the program committee of TC7, in particular P. Kail (chairman of the TC7), and the administration of the University of Trier. The organization of the conference would not have been possible without the help and support of many individuals: Among those were H. Beewen, F. Leibfritz, J. Maruhn, U. Morbach, M. Pick, M. Ries, M. Schulze, C. Schwarz, T. Voetmann. We also appreciate the valuable assistance of the publisher, in particular Y. Lambert, in the preparation of the proceedings. Trier, May 2003 E. W. Sachs and R. Tichatschke ON APPROXIMATE ROBUST COUNTERPARTS OF UNCERTAIN SEMIDEFINITE AND CONIC QUADRATIC PROGRAMS Aharon Ben-Tal Faculty of Industrial Engineering and Management, Technion Israel Institute of Technology * abental@ie.technion.ac.il Arkadi Nemirovski Faculty of Industrial Engineering and Management, Technion Israel Institute of Technology nemirovs@ie.technion.ac.il Abstract We present efficiently verifiable sufficient conditions for the validity of specific NP-hard semi-infinite systems of semidefinite and conic qua- dratic constraints arising in the framework of Robust Convex Program- ming and demonstrate that these conditions are “tight” up to an ab- solute constant factor. We discuss applications in Control on the con- struction of a quadratic Lyapunov function for linear dynamic system under interval uncertainty. 1. Introduction The subject of this paper are “tractable approximations” of intractable semi-infinite convex optimization programs arising as robust counter- parts of uncertain conic quadratic and semidefinite problems. We start with specifying the relevant notions. Let K be a cone in (closed, pointed, convex and with a nonempty interior). A conic program asso- *This research was partially supported by the Israeli Ministry of Science grant # 0200-1-98 and the Israel Science Foundation grant # 683/99-10.0. Part of this research was conducted while the authors were visiting the Faculty of Information Technology and Systems (ITS), Department ISA. 1 2 ciated with K is an optimization program of the form nain{/^a; | Ax — be K}; (CP) here x G R^. An uncertain conic problem is a family |nnn|/^a; | Ax — be k| |(/, A, &) G (UCP) of conic problems with common K and data (/, A, B) running through a given uncertainty set U. In fact, we always can get rid of uncertainty in / and assume that / is “certain”, i.e., common for all data from U)] indeed, we always can rewrite the problems of the family as min < t t,x [ Thus, we lose nothing (and gain a lot, as far as notation is concerned) when assuming from now on that / is certain, so that n, K, / form the common “structure” of problems from the family, while A, b are the data of particular problems ( “instances” ) from the family. The Robust Optimization methodology developed in [1, 2, 3, 5, 8, 9] associates with (UCP) its Robust Counterpart (RC) min | Ax — 6 G K V(c, A, b) G . (R) Feasible/optimal solutions of (R) are called robust feasible, resp., robust optimal solutions of the uncertain problem (UCP); the importance of these solutions is motivated and illustrated in [1, 2, 3, 5, 8, 9]. Accepting the concept of robust feasibile/optimal solutions, the cru- cial question is how to build these solutions. Note that (R) is a semi- infinite conic program and as such can be computationally intractable. In this respect, there are “good cases”, where the RC is equivalent to an explicit computationally tractable convex optimization program, as well as “bad cases”, where the RC is NP-hard (see [3, 5] for “generic examples” of both types). In “bad cases”, the Robust Optimization methodology recommends replacing the computationally intractable ro- bust counterpart by its tractable approximation. An approximate robust counterpart of (UCP) is a conic problem min ^f^x I Px + Qu + r G k| (AR) such that the projection X ( AR) of the feasible set of (AR) onto the plane of X- variables is contained in the feasible set of (R); thus, (AR) is “more Ax — b t - f'^x J € K = K X R+l Approximate Robust Counterparts of Uncertain SDP and CQP 3 conservative” than (R). An immediate question is how to measure the “conservativeness” of (AR), with the ultimate goal to use a “moderately conservative” computationally tractable approximate RCs instead of the “true” (intractable) RCs. A natural way to measure the quality of an approximate RC is as follows. Assume that the uncertainty set U is of the form U = {{A,b) = {A’^,b’^) + V}, where 6^) is the “nominal data” and V is the perturbation set which we assume from now on to be a convex compact set symmetric w.r.t. the origin. Under our assumptions, (UCP) can be treated as a member of the parametric family |mjn I Ax -b e k| : (A, b) e Up = {(A, b) = (A^, b^) + pV}| (UCP,) of uncertain conic problems, where p > 0 can be viewed as the “level of uncertainty”. Observing that the robust feasible set Xp of (UCP,) shrinks as p increases and that (AR) is an approximation of (R) if and only if A(AR) C A'l, a natural way to measure the quality of (AR) is to look at the quantity p(AR:R) = inf{p > 1 : A(AR) D A,}, which we call the conservativeness of the approximation (AR) of (R). Thus, the fact that (AR) is an approximation of (R) with the conserva- tiveness < a means that (i) If X can be extended to a feasible solution of (AR), then x is a robust feasible solution of (UCP); (ii) If X cannot be extended to a feasible solution of (AR), then x is not robust feasible for the uncertain problem (UCPa) obtained from (UCP)=(UCPi) by increasing the level of uncertainty by the factor a. Note that in real-world applications the level of uncertainty normally is known “up to a factor of order of 1”; thus, we have basically the same reasons to use the “true” robust counterpart as to use its approximation with p(AR:R) of order of 1. The goal of this paper is to overview recent results on tractable approximate robust counterparts with “0(l)-conservativeness”, specif- ically, the results on semidefinite problem affected by box uncertainty and on conic quadratic problem affected by ellipsoidal uncertainty. We present the approximation schemes, discuss their quality, illustrate the 4 results by some applications (specifically, in Lyapunov Stability Anal- ysis/ Synthesis for uncertain linear dynamic systems with interval un- certainty) and establish links of some of the results with recent devel- opments in the area of semidefinite relaxations of difficult optimization problems. 2. Uncertain SDP with box uncertainty Let be space of real symmetric mxm matrices and S!p be the cone n of positive semidefinite mxm matrices, and let A^[x] = ^ xjA^^ : j^n ^ gm affine mappings, £ = 0, ..., L. Let also C[x] be a symmetric matrix affinely depending on x. Consider the uncertain semidefinite program c[x] y 0 A[x] y 0 3(ti ;|| u ||oo< p) : A[x] = A^[x] + '^uiA^[x][ E=i ) (USDM) here an in what follows, for A^B E the relation Ay B means that A — B E S!J!. Note that (USD[p]) is the general form of an uncertain semidefinite program affected by “box” uncertainty (one where the un- certainty set is an affine image of a multi-dimensional cube). Note also that the Linear Matrix Inequality (LMI) C[x] y 0 represents the part of the constraints which are “certain” - not affected by the uncertainty. The robust counterpart of (USD[p]) is the semi-infinite semidefinite program mm \ f^x c[x] y 0 A0[x] + E uiA^[x] y 0 1=1 V(?/ :|| u ||oo< p) (RW) It is known (see, e.g., [11]) that in general (R[p]) is NP-hard; this is so already for the associated analysis problem “Given x^ check whether it is feasible for (R[pj)”, and even in the case when all the “edge matrices” A^[x]^ i = 1,...,L, are of rank 2. At the same time, we can easily point out a tractable approximation of (R[p]), namely, the semidefinite program min < fx: c[x] y 0 X^y±A^[x],i=l,...,L, -pEx^yo, £=1 J (ARM) This indeed is an approximation - the a;-component of a feasible solution to (AR[pj) clearly is feasible for (R[pj). Surprisingly enough, this fairly Approximate Robust Counterparts of Uncertain SDP and CQP 5 simplistic approximation turns out to be pretty tight, provided that the edge matrices A^[x]^ I > 1, are of small rank: Theorem 1 [6] Let Aq, be mxm symmetric matrices. Consider the following two predicates: (/[p]) : ^0 + E ^0 V(u :|| n ||oo< p); l=i {II[p\) : 3Xi, y ^ = 1, p E Xe ^ Ao; l-\ ( 1 ) here p>Q is a parameter. Then (i) If {II[p]) is valid, so is (I[p]); (ii) If {II[p]) is not valid, so is {I[d{p)p]), where p = Rank(A^) (note 1 < i in the maxj and ‘d{p) is a universal function of p given by 1 d{k) </ (27T)-^/2exp| \ k = 1 ^ i=l [ ^ J i=l j ( 2 ) Note that i?(l) = 1,^(2) = I « 1.57..., i9(3) = 1.73..., i?(4) = 2; ^ Vp. (3) Corollary 2.1 Consider the robust counterpart (R[p]) of an uncertain SDP affected by interval uncertainty along with the approximated robust counterpart (AR[p]) of the problem, and let p = max maxRank(.4^[a;]) l<i<L X V L J/ (note \ < I in the maxj. Then (AR[p]) is at most d{p)- conservative approximation of (R[p]); where d is given by (2). In particular, ■ The suprema p^ and p of those p > 0 for which (R[p]); respectively, (AR[p]) is feasible, are linked by the relation P<p"< 6 ■ The optimal values f{p) of (R[p]), respectively, (AR[p]) are linked by the relation f*{p) < fip) < f*{'&{p)p), P > 0. The essence of the matter is that the quality of the approximate robust counterpart as stated by Corollary 2.1 depends solely on the ranks of the “basic perturbation matrices” A^[x], i > 1, and is independent of any other sizes of the problem. Fortunately, there are many applications where the ranks of the basic perturbations are small, so that the quality of the approximation is not too bad. As an important example, consider the Lyapunov Stability Analysis problem. Lyapunov Stability Analysis. Consider an uncertain time-varying linear dynamic system z{t) = A{t)z{t) (4) where all we know about the matrix A{t) of the system is that it is a measurable function of t taking values in a given compact set U which, for the sake of simplicity, is assumed to be an interval matrix: A{t)eU = Up = {AeK^^^:\Aij-Aij\<pBij,iJ = l,...,n}’ (5) here A corresponds to the “nominal” time-invariant system, and D is a given “scale matrix” with nonnegative entries. In applications, the very first question about (4) is whether the system is stable, i.e., whether it is true that whatever is a measurable function A(-) taking values in Up, every trajectory z(t) of (4) converges to 0 as t oo. The standard sufficient condition for the stability of (4) - (5) is the existence of a common quadratic Lyapunov stability certificate for all matrices A e Up, i.e., the existence of an n x n matrix X 0 such that A^X + XA^O ^AeUp. Indeed, if such a certificate X exists, then A^X + XA:< -aX for certain a > 0 and all A e Up. As an immediate computation demon- strates, the latter inequality implies that (t)Xz{t)) < —az'^ {t)Xz{t) for all t and all trajectories of (4), provided that A{t) G Up for all t. The resulting differential inequality, in turn, implies that z'^{t)Xz{t) < Approximate Robust Counterparts of Uncertain SDP and CQP 7 exp{—at}{z^{ 0 )Xz{ 0 )) — > 0 as t — )► +oc; since X >- 0, it follows that z{t) — )► 0 as t — > oo. Note that by homogeneity reasons a stability certificate, if it exists, always can be normalized by the requirements (6) A^X + XA ^ -I \fAeUp. Thus, whenever (L[p]) is solvable, we can be sure that system (4) - (5) is stable. Although this condition is in general only sufficient and not necessary for stability (it is necessary only when p = 0, i.e., in the case of a certain time-invariant system), it is commonly used to certify stability. This, however, is not always easy to check the condition itself, since (L[p]) is a semi-infinite system of LMIs. Of course, since the LMI (L[p].b) is linear in A, this semi-infinite system of LMIs is equivalent to the usual - finite - system of LMIs (a) X y I, {b) A^X + XAy ^ -I V^-l,...,2^, where N is the number of uncertain entries in the matrix A (i.e., the number of pairs ij such that 7^ 0) and Ai, ..., A2iv are the vertices of Up. However, the size of (6) is not polynomial in n, except for the (unrealistic) case when N is once for ever fixed or logarithmically slow grows with n. In general, it is NP-hard to check whether (6), or, which is the same, (L[p]) is feasible [11]. Now note that with the interval uncertainty (5), the troublemaking semi-infinite LMI (L[p].b) is nothing but the robust counterpart n I - A^X - XA] + ^ UijTtij [ejiXeif + (Xei)ej] ^ 0 (7) i,3=^ y{u = {Uij} : -p < Uij < p) of the uncertain LMI [{a^X + XA a -/} -.AeUp]-, here Ci are the standard basic orths in R^. Consequently, we can ap- proximate (L[pj) with a tractable system of LMIs xy I (61) X^i y ± D,,- [ejiXeif + (Xe^ej] , V(i, j : Bij > 0) V ' A»IX] (62) [/ - A^x - xaI -p z y 0 V Zy zj:Dij>0 (AL[p]) A°[X] 8 in matrix variables X, Invoking Corollary 2.1, we see that the relations between (AL[p]) and (L[p]) are as follows: 1 Whenever X can be extended to a feasible solution of the system (AL[p]), X is feasible for (L[p]) and thus certifies the stability of the uncertain dynamic system (4), the uncertainty set for the system being Up] 2 If X cannot be extended to a feasible solution of the system (AL[p]) , then X does not solve the system (L[|/>]) (note that the ranks of the basic perturbation matrices A^^[X] are at most 2, and t9(2) 2b It follows that the supremum p of those p > 0 for which (AL[p]) is solvable is a lower bound for the Lyapunov stability radius of uncertain system (4), i.e., for the supremum of those p > 0 for which all matrices from Up share a common Lyapunov stability certificate, and that this lower bound is tight up to factor provided, of course, that A is stable (or, which is the same, that p'^ > 0). Note that the bound p on the Lyapunov stability radius is efficiently computable; this is the optimal value in the Generalized Eigenvalue Problem of maximizing p in variables p, X, {X^-^ } under the constraints (LA[p]). We have considered a specific application of Theorem 1 in Control. There are many other applications of this theorem to systems of LMIs arising in Control and affected by an interval data uncertainty. Usually the structure of such a system ensures that when perturbing a single data entry, the right hand side of every LMI is perturbed by a matrix of a small rank, which is the favourable case for our approximation scheme. Simplifying the approximation. A severe computational short- coming of the approximation (AR[p]) is that its sizes, although polyno- mial in the sizes of the approximated system (R[p]) and the uncertainty set, are pretty large, since the approximation has an additional m x m matrix variable X^ and two m x m LMIs Xi y ±A^[x] per every one of the basic perturbations. It turns out that under favourable circum- stances the sizes of the approximation can be reduced dramatically. This size reduction is based upon the following two facts: Approximate Robust Counterparts of Uncertain SDP and CQP 9 Lemma 2.1 [6] (i) Let a ^ 0,6 6e two vectors. A matrix X satisfies the relation X y ±[a6^ + ha^] if and only if there exists A > 0 such that X y XaoF + y^6^ A (when A = 0, the left hand side, by definition, is the zero matrix when 6 = 0 and is undefined otherwise). (ii) Let S be a symmetric m x m matrix of rank k > Q, so that S — P^RP with a nonsingular k x k matrix R and k x m matrix P of rank k. A matrix X satisfies the relation xy±s if and only if there exists a k x k matrix Y such that Y y ±i? and X y p^YP. The simplification of (AR[p]), based on Lemma 2.1, is as follows. Let us partition the basic perturbation matrices A^[x] into two groups: those with A^[x] actually depending on x, and those with A^[x] independent of X. Assume that (A) The basic perturbation matrices depending on x, let them be A^[x],...,A^[x], are of the form A^[x] = aibj[x] + bi[x]aj , i = 1, ..., M, (8) where ai, bi[x] are vectors and b^[x] are affine in x. Note that the assumption holds true, e.g., in the case of the Lyapunov Stability Analysis problem under interval uncertainty, see (AL[p]). The basic perturbation matrices A^ with i > M are independent of X, and we can represent these matrices as A^ = PlBePe, (9) where Bi are nonsingular symmetric x k(, matrices and k^ = Rank(v4^). Observe that when speaking about the approximate robust counter- part (AR[p]), we are not interested at all in the additional matrix vari- ables Xi] all which matters is the projection of the feasible set of (AR[p]) on the plane of the original design variables x. In other words, as far as the approximating properties are concerned, we lose nothing when re- placing the constraints in (AR[p]) with any other system S of constraints 10 in variables x and, perhaps, additional variables, provided that the pro- jection of the feasible set of the new system on the plane of rr- variables is the same as the one for (AR[p]). Invoking Lemma 2.1, we see that the latter property is possessed by the system of constraints (a) (b) (c) (d) A^[x]-p c[x] y 0 A^>0, ^ = Y, + X^'^h([x]b^[x^ + Y PjYtPi c=i '■ ■' t=M+l ^0, ( 10 ) in variables x, {A^}, {Y^}. By the Schur Complement Lemma, (10) is equivalent to the system of LMIs (a) ib) (c) C[x] y 0 y i = M + 1, L, M I M i=i -P E PlYePi bi[x] 62 N . • • bM[x] b{[x\ Ai q[x] A2 Am ( 11 ) in variables x, {A^ G E Consequently, (AR[p]) is equivalent to the semidefinite program of minimizing the objective f^x under the constraints (11). Note that the resulting problem (A[p]) is typically much better suited for numerical processing than (AR[p]). Indeed, the first M of the m x m matrix variables arising in the original problem are now replaced with M scalar variables A^, while the remaining L — M of X^s are replaced with ki x matrix variables normally, the ranks k^ of the basic perturbation matrices are much smaller than the sizes m of these matrices, so that this transformation reduces quite significantly the design dimension of the problem. As about LMIs, the 2L “large” (of the size m x m) LMIs X^ y :fzA^[x] of the original problem are now replaced with 2{L — M) “small” (of the sizes k^ x k^) LMIs (11. &) and a single “very large” - of the size (m + M) X (m + M) - LMI (11. c). Note that the latter LMI, although large, is of a very simple “arrow” structure and is extremely sparse. Links with quadratic maximization over the unit cube. It turns out that Theorem 1 has direct links with the problem of maximiz- Approximate Robust Counterparts of Uncertain SDP and CQP 11 ing a positive definite quadratic form over the unit cube. The link is given by the following simple observation: Proposition 2.1 Let Q be a positive definite m x m matrix. Then the reciprocal p{Q) of the quantity oo{Q) = ma.x{x^ Qx :|| x ||oo^ 1} equals to the maximum of those p > 0 for which all matrices from the matrix box Cp = Q ^ + {A = : \Aij\ < p} are positive semidefinite. Proof. (jo{Q) is the minimum of those u for which the ellipsoid {x : x^Qx < uo] contains the unit cube, or, which is the same, the minimum of those uj for which the polar of the ellipsoid (which is the ellipsoid is contained in the polar of the cube (which is the set :|| ^ ||i< 1}). In other words, p{Q) = = max > P II ^ 111 • Observing that by evident reasons II ^ lli= : A = |^j| < 1, i,j = , we conclude that p{Q) = max{p : Q~^ y pA V(A = A^ : \Aij\ < 1, ij = 1, as claimed. ■ Since the “edge matrices” of the matrix box Cp are of ranks 1 or 2 { T e ' e • % 7 * r’ , r • ^ • 7 ejgj + ejej , ^ < j ’ 1 < * < J < m, where are the standard basic orths), Theorem 1 says that the efficiently computable quantity p = sup p : 3{X^^} : X^3 y l<i<j<m Q-^-p E h 0 is a lower bound, tight within the factor d{2) = for the quantity p((5), and consequently the quantity u){Q) = is an upper bound, tight within the same factor |, for the maximum oo{Q) of the quadratic form x^Qx over the unit cube: u{Q) < Q{Q) < ^u){Q). ( 12 ) 12 On a closest inspection (see [6]), it turns out that u){Q) is nothing but the standard semidefinite bound cu{Q) — max {Tr((5X) : X ^ 0, Xu < 1, z = 1, m} — min Ai : Diag{A} ^ Q on (jj{Q). The fact that bound (13) satisfies (12) was originally es- tablished by Yu. Nesterov [13] via a completely different construction which heavily exploits the famous MAXCUT-related “random hyper- plane” technique of Goemans and Williamson [10]. Surprisingly, the re-derivation of (12) we have presented, although uses randomization arguments, seems to have nothing in common with the random hyper- plane technique. Theorem 1: Sketch of the proof.. We believe that not only the statement, but also the proof of Theorem 1 is rather instructive, this is why we sketch the proof here. We intend to focus on the most nontrivial part of the Theorem, which is the claim that when {II[p]) is not valid, so is {I[&{p)p]) (as about the remaining statements of Theorem 1, note that (i) is evident, while the claim that the function (2) satisfies (3) is more or less straightforward). Thus, assume that {H[p]) is not valid; we should prove that then {I[d{p)p]) also is not valid, where ‘d{-) is given by (2). The fact that (II[p]) is not valid means exactly that the optimal value in the semidefinite program min It : Xi y £ — 1, ..., T; p^^^i + tl > t,{Xe} [ J is positive. Applying semidefinite duality (which is a completely me- chanical process) we, after simple manipulations, conclude that in this case there exists a matrix W ^ 0, Tr(W) == 1, such that L Y, II ||i> (14) where X{B) is the vector of eigenvalues of a symmetric matrix B. Now observe that if S is a symmetric mxm matrix and ^ is an m-dimensional Gaussian random vector with zero mean and unit covariance matrix, then II m 111 . (15) Approximate Robust Counterparts of Uncertain SDP and CQP 13 Indeed, in the case when B is diagonal, this relation is a direct conse- quence of the definition of '??(•); the general case can be reduced imme- diately to the case of diagonal B due to the rotational invariance of the distribution of Since the matrices are of ranks not exceeding == ^m^Rank(A^), (14) combines with (15) to imply that L It follows that there exists ( = such that setting ei = sign((’^74^0, we can rewrite the latter relation as ^{p'd{p)€^) Al c > C^AC, and we see that the matrix Ao — u^Ai is not positive definite, while by construction \u^\ < Thus, (7['i9(/i)p]) indeed is not valid. 3. Approximate robust counterparts of uncertain convex quadratic problems Recall that a generic convex quadratically constrained problem is nun : x^AjAiX < 2bJ x + i = 1, ..., m| (QP) here x G R^, and Ai are rrii x n matrices. The data of an instance is (c, {A^, 6^, Ci}^i). When speaking about uncertain problems of this type, we may focus on the robust counterpart of the system of constraints (since we have agreed to treat the objective as certain). In fact we can restrict ourselves with building an (approximate) robust counterpart of a single convex quadratic constraint, since the robust counterpart is a “constraint-wise” construction. Thus, we intend to focus on building an approximate robust counterpart of a single constraint x^A^Ax < 2b^x + c (16) 14 with the data (A^b^c). We assume that the uncertainty set is “parame- terized” by a vector of perturbations: U = Up=^{A, b, c) = (A", 6", c") + c^) : C € pvj , (17) here V is a convex compact symmetric w.r.t. the origin set in (“the set of standard perturbations”) and p > 0 is the “perturbation level”. In what follows, we shall focus on the case when V is given as an intersection of ellipsoids centered at the origin: V = {c e I C^QiC < 1, * = 1, k} , (18) k where Qz ^ 0 and Qi y We will be interested also in two particular i=l cases of general ellipsoidal uncertainty (18), namely, in the cases of • simple ellipsoidal uncertainty k = 1] • box uncertainty: k = L and ^ ■••5 L. Note that the ellipsoidal robust counterpart of an uncertain quadratic constraint affected by uncertainty (18) is, in general, NP-hard. In- deed, already in the case of box uncertainty to verify robust feasibility of a given candidate solution is not easier than to maximize a convex quadratic form over the unit cube, which is known to be NP-hard. Thus, all we can hope for in the case of uncertainty (18) is a “computation- ally tractable” approximate robust counterpart with a moderate level of conservativeness, and we are about to build such a counterpart. 3.1 Building the robust counterpart of (16) — (18) We build an approximate robust counterpart via the standard semidef- inite relaxation scheme. For x G R^, let / x^b^ \ a[x] = A^x^ A[x] — p ^A^x^ A^x ^ ..., A^x , b[x] = p (c^ \x^b^ ) (19) d - f : I , e[x] = 2x^b^ + c^, L SO that for all ( one has tT r A- + pJ2CcA^ X = = {a[x] + ^[s]C)^ (a[®] + ^NC) , Approximate Robust Counterparts of Uncertain SDP and CQP 15 2x^ + 2{b[x] +d)^ C + e[x\. Prom these relations it follows that (а) X is robust feasible for (16) - (18) t {a[x] + A[x]Ci^ (a[a;] + A[a;]() < 2 (6[a;] + ( + e[x] y{C:eQiC<l,i = l,-,k) t A^[x]A[x\Q + [^[a;]a[a;] — b[x] — d] < e[x] — a^[x]a[x] v(C:C^ftC<i,* = i,-,^) A^[x\A[x\(^ + 2t(^ [yl[a;]a[rc] — b[x] — d\ < e[x] — a^[x]a[x] V(C, t : CQiC < 1, i = 1, k, = 1) t (б) Al^[x]A[x](^ + 2t(^^ [j4[a;]a[3;] — 6[a;] — d\ < e[x\ — o^[a;]o[a;] V(C,i:C^QK<i,* = l,-,A;,^2<i). (20) Thus, X is robust feasible if and only if the quadratic inequality (20.6) in variables is a consequence of the system of quadratic inequalities ^ < 1. An evident sufficient condition for (20.6) to hold true is the possibility to majorate the left hand side of (20.6) for all by a sum "^XiC'^QiC + with nonnegative weights A^,/i i satisfying the relation + /i < e[x] — a^[x]a[x]. Thus, we come to i the implication ^ \i + p < e[x] - a^[x]a[x], (а) 3{p > 0, {A, > 0}) : ^ +2t('^ [^[a;]a[a;] — 6[a;] — d] V(C, t) CA^[x]A[x]C + 2tC^ [A[x]a[x] - 6[a;] - d] < e[x] - a^[x]a[x] y{(:,t:C'^QiC<l,i = l,...,k,t^ < 1) (б) rr is robust feasible ( 21 ) 16 A routine processing of condition (21. a) which we skip here demonstrates that the condition is equivalent to the solvability of the system of LMIs e[x] - E Aj [-6[rr] - df dd[x] 1=1 [-h[x] -d] E XiQi -A^x] i=l a[x] ^ Xi ^ 0, i Ij Ai, A / bO, ( 22 ) in variables x, A. We arrive at the following Proposition 3.1 The system of LMIs (22) is an approximate robust counterpart of the uncertain convex quadratic constraint (16) - (18). The level of conservativeness O of (22) can be bounded as follows: Theorem 2 [7] (i) In the case of a general-type ellipsoidal uncertainty (18)^ one has n<n = 3.6 + 2 In ^Rank(Qi) i=l (23) Note that the right hand side in (23) is < 6, provided that k ^Rank(Qi) < 10,853,519. Z=1 (ii) In the case of box uncertainty: C'^QiC = CiA^i^k = L = dim ^ ^ (hi) In the case of simple (k = 1) ellipsoidal uncertainty (18); = 1 (22) is equivalent to the robust counterpart o/ (16) - (18). An instrumental role in the proof is played by the following fact which seems to be interesting by its own right: Theorem 3 [7] Let i ?, - be symmetric n x n matrices such that i?i, ..., ^ 0 and there exist nonnegative weights Xi such that k Yf, XiRi y Consider the optimization program 2 = 0 OPT max < 1, 2 = 0, ...,/c| (24) Approximate Robust Counterparts of Uncertain SDP and CQP along with the semidefinite program 17 { k k \ ^ Mi : ^ Mi^i h R, M > 0 r • i=o i=o J (25) Then (25) is solvable, its optimal value majorates the one of (24)^ and there exists a vector such that y*Ry* = SDP; yjRoy* < 1; y^Riy* < i = i, k, n = i \ 3.6 + 21n ( ^ Rank(<3j)j, In particular, 81n2 + 41nn + 21n ( ^ Rank(Qi) j , Rq = q^q is dyadic otherwise (26) OPT < SDP < • OPT. 4. Approximate robust counterparts of uncertain conic quadratic problems The constructions and results of the previous section can be extended to the essentially more general case of conic quadratic problems. Recall that a generic conic quadratic problem (another name: SOCP - Second Order Cone Problem) is min|/^a; :|| AiX + bi || 2 < aJx + jSi, i = (CQP) here x E R^, and Ai are mi x n matrices; the data of (CQP) is the col- lection (f,{Ai,bi,ai,l3i}'^i). As always, we assume the objective to be “certain” and thus may restrict ourselves with building an approximate robust counterpart of a single conic quadratic constraint II Ax + b || 2 < a^x + 13 (27) with data (A, b, a, /3). We assume that the uncertainty set is parameterized by a vector of perturbations and that the uncertainty is ^‘side-wise^^: the perturbations affecting the left- and the right hand side data of (27) run independently 18 of each other through the respective uncertainty sets: U = X {A,b) = {A-,h-)+Y.U{A^.h^) t=\ R ^ ^ ^ ^ nr{a\l3^) r=l C e , r] € pV"g*** (28) In what follows, we focus on the case when is given as an inter- section of ellipsoids centered at the origin: V^®f‘ = {c e I C^QiC < 1 , * = 1 , k} , ( 29 ) k where Qi ^ 0 and X) Qi ^ 0. We will be interested also in two particular i=l cases of general ellipsoidal uncertainty (29), namely, in the cases of • simple ellipsoidal uncertainty k = 1; • box uncertainty: k = L and C'^QiC = ^ = 1? •••5 L. As about the “right hand side” perturbation set we allow for a much more general geometry, namely, we assume only that is bounded, contains 0 and is semideGnite-representable: r] G 3u : P{r]) + Q{u) — 5^0, (30) where P(? 7 ), Q{u) are symmetric matrices linearly depending on rj^u^ respectively. We assume also that the LMI in (30) is strictly feasible, i.e., that p{v) + Q{u) -SyO for appropriately chosen u. 4.1 Building approximate robust counterpart of (27) - (30) For X G R^, let a[x] = A^x + 6^, A[x] = p \a^x + b^^A^x + 6^, ..., A^x + , (31) so that for all ( one has a^ + pJ2qa^ i X + i = a[x] + v4[rE]C. Since the uncertainty is side-wise, x is robust feasible for (27) - (30) if and only if there exists r such that the left hand side in (27), evaluated at Approximate Robust Counterparts of Uncertain SDP and CQP 19 a;, is at most r for all left-hand-side data from while the right hand side, evaluated at x, is at least r for all right-hand-side data from The latter condition can be processed straightforwardly via semideGnite duality^ with the following result: Proposition 4.1 A pair (x^r) is such that r < a^x + P for all (a^/3) G fright jg extended to a solution of the system of LMIs T < x^a^ + + Tr(5F), P*(F) = ^[x] = Q*^v) = o, vyo p[x'^a^ + /3^] > p[x'^a^ + /3^] ) (32) in variables x^r^V . Here for a linear mapping A{z) = ^ ^kAk • i=l taking values in the space ofmxm symmetric matrices, A*{Z) = /Tr{ZA,)\ ; : S”* — )■ is the mapping conjugate to ^(-)- \tv{ZAi)) In view of the above observations, we have (a) X is robust feasible for (27) - (30) t . ( (a;, T, y) solves (32) ^ ’ I II + ^WC l|2< T V(C : CQi( <l,i = l, k) t . ( (a;, r, F) solves (32) ’ ■ I II ±«W + 2 IMC ||2< T y{( : CQiC <l,i = (b) 3(r,F): 3(r,F):{ t {x, T, V) solves (32) ta[x] + 2 l[a:]C || 2 < t y{C,t:C'^QiC <l,i = l,...,k,f <1) t f (1) (x,T,V) solves (32) (2) r > 0 ta[x\ + A[x\C, || 2 < V(C,*:C^QiC< < 1) ( 3 ) ( 33 ) 20 Thus, X is robust feasible if and only if (33.6) holds true. Now observe that 3(/i, Ai , ..., : (a) > 0, Ai > 0, i = 1, ..., k, (b) IJ-+ T, ^i< T, i=l (c) r + E >11 + ^[a;]C Hi V(i,C) II ta[x] + A[x](^ || 2 < V(C,i : C^QiC < 1,* = 1, < 1). Via Schur complement, for nonnegative the inequality T ^ >11 to\x\ + ^[a;]C holds true for a given pair (t, () if and only if (34) 0 C-] E ^iQi t LC [ a[x] A[x] ] ( V <^] a^[x] A^[x] tI r I fj- a^[x] EXiQi i=l A^[x] a[x] A[x] tI T e I In view of this observation combined with the fact that the union, over all r, (^, of the image spaces of the matrices /- V/ IS the entire we conclude that in the case of nonnegative r, //, {A^} the relation (34. c) is equivalent to ( ij- a^[a;] ' S ^iQi i=l A'^lx] a[x] A[x] rl y 0. 22 References [1] A. Ben-Tal, A. Nemirovski, “Stable Truss Topology Design via Semidefinite Pro- gramming” - SIAM Journal on Optimization 7 (1997), 991-1016. [2] A. Ben-Tal, A. Nemirovski, “Robust solutions to uncertain linear programs” - OR Letters 25 (1999), 1-13. [3] A. Ben-Tal, A. Nemirovski, “Robust Convex Optimization” - Mathematics of Operations Research 23 (1998). [4] S. Boyd, L. El Ghaoui, F. Feron, V. Balakrishnan, Linear Matrix Inequalities in System and Control Theory - volume 15 of Studies in Applied Mathematics, SIAM, Philadelphia, 1994. [5] A. Ben-Tal, L. El Ghaoui, A. Nemirovski, “Robust Semidefinite Programming” - in: R. Saigal, H. Wolkowicz, L. Vandenberghe, Eds. Handbook on Semidefinite Programming, Kluwer Academis Publishers, 2000, 139-162. [6] A. Ben-Tal, A. Nemirovski, “On tractable approximations of uncertain linear matrix inequalities affected by interval uncertainty” - SIAM J. on Optimization, 2001, to appear. [7] A. Ben-Tal, A. Nemirovski, C. Roos, “Robust solutions of uncertain quadratic and conic quadratic problems” - Research Report #2/01, Minerva Optimization Center, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel. http://iew3. technion. ac.il:8080/subhome.phtml?/Home/research [8] L. El-Ghaoui, H. Lebret, “Robust solutions to least-square problems with uncer- tain data matrices” - SIAM J. of Matrix Anal, and Appl. 18 (1997), 1035-1064. [9] L. El-Ghaoui, F. Oustry, H. Lebret, “Robust solutions to uncertain semidefinite programs” - SIAM J. on Optimization 9 (1998), 33-52. [10] M.X. Goemans, D.P. Williamson, “Improved approximation algorithms for Max- imum Cut and Satisfiability problems using semidefinite programming” - Journal of ACM 42 (1995), 1115-1145. [11] A. Nemirovski, “Several NP-hard problems arising in Robust Stability Analysis” - Math. Control Signals Systems 6 (1993), 99-105. [12] A. Nemirovski, C. Roos, T. Terlaky, “On maximization of quadratic form over intersection of ellipsoids with common center” - Mathematical Programming 86 (2000), 463-473. [13] Yu. Nesterov, “Semidefinite relaxation and non-convex quadratic optimization” - Optimization Methods and Software 12 (1997), 1-20. [14] Yu. Nesterov, “Nonconvex quadratic optimization via conic relaxation”, in: R. Saigal, H. Wolkowicz, L. Vandenberghe, Eds. Handbook on Semidefinite Pro- gramming, Kluwer Academis Publishers, 2000, 363-387. [15] Y. Ye, “Approximating quadratic programming with bounds and quadratic con- straints” - Math. Programming 47 (1999), 219-226. GLOBAL CONVERGENCE OF A HYBRID TRUST-REGION SQP-FILTER ALGORITHM FOR GENERAL NONLINEAR PROGRAMMING Nick Gould Rutherford Appleton Laboratory Computational Science and Engineering Department Chilton, Oxfordshire, England gould@rl.ac.uk Philippe L. Toint Department of Mathematics University of Namur 61, rue de Bruxelles, B-5000 Namur, Belgium philippe.toint@fundp.ac.be Abstract Global convergence to first-order critical points is proved for a variant of the trust-region SQP-filter algorithm analyzed in (Fletcher, Gould, Leyffer and Toint). This variant allows the use of two types of step strategies: the first decomposes the step into its normal and tangential components, while the second replaces this decomposition by a stronger condition on the associated model decrease. 1. Introduction We analyze an algorithm for solving optimization problems where a smooth objective function is to be minimized subject to smooth nonlin- ear constraints. No convexity assumption is made. More formally, we consider the problem minimize f{x) subject to cs{x) = 0 (1.1) cx{x) > 0, 23 24 where / is a twice continuously differentiable real valued function of the variables x G IR^ and cs{x) and cx{x) are twice continuously differen- tiable functions from IR^ into IR^ and from IR^ into IR^, respectively. Let c{x)^ = {cs{xY' cx{xY'). The class of algorithms that we discuss belongs to the class of trust- region methods and, more specifically, to that of filter methods intro- duced by Fletcher and Leyffer (1997), in which the use of a penalty function, a common feature of the large majority of the algorithms for constrained optimization, is replaced by the introduction of a so-called “filter”. A global convergence theory for an algorithm of this class is proposed in Fletcher, Leyffer and Toint (1998), in which the objective function is locally approximated by a linear function, leading, at each iteration, to the (exact) solution of a linear program. This algorithm therefore mixes the use of the filter with sequential linear programming (SLP). Simi- lar results are shown in Fletcher, Leyffer and Toint (2000), where the approximation of the objective function is quadratic, leading to sequen- tial quadratic programming (SQP) methods, but at the price of finding a global minimizer of the possibly nonconvex quadratic programming subproblem, which is known to be a very difficult task. Convergence of SQP filter methods was also considered in Fletcher, Gould, Leyffer and Toint (1999), where the SQP step was decomposed in “normal” and “tangential” components. Although this latter procedure is compu- tationally well-defined and considerably less complex than finding the global minimum of a general quadratic program, it may sometimes be costly, and a simpler strategy, where the step is computed “as a whole” can also be of practical interest whenever possible. The purpose of this paper, a companion of Fletcher et al. (1999), is to analyze a hybrid algo- rithm that uses the decomposition of the step into normal and tangential components as infrequently as possible. 2. A Hybrid Trust-Region SQP-Filter Algorithms For the sake of completeness and clarity, we review briefly the main constituent parts of the SQP algorithm discussed in Fletcher et al. (1999). Sequential quadratic programming methods are iterative. At a given iterate Xk-^ they implicitly apply Newton’s method to solve (a lo- cal version of) the first-order necessary optimality conditions by solving Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 25 the quadratic programming subproblem QF{xk) given by minimize fk + {gk, s) + i(s, Hks) subject to C£{xk) + As{xk)s = 0 (2-1) cx{xk) + Ai{xk)s > 0, where fk = f{xk), 9k = g{xk) '= '^xf{xk), where Asixk) and Ax{xk) are the Jacobians of the constraint functions cs and cj at Xk and where is a symmetric matrix. We will not immediately be concerned about how Hk is obtained, but we will return to this point in Section 3. Assuming a suitable value of can be found, the solution of QP(xjt) then yields a step Sk’ If Sk — 0, then x^ is first-order critical for problem (1.1). 2.1 The filter Unfortunately, due to the locally convergent nature of Newton’s iter- ation, the step Sk may not always be very good. Thus, having computed (in a so far unspecified manner) a step Sk from our current iterate we need to decide whether the trial point Xk + Sk is any better than Xk as an approximate solution to our original problem (1.1). This is achieved by using the notion of a filter, itself based on that of dominance. If we define the feasibility measure 0(j;) = max 0, max |ci(o;)|, max — Ci(x) , (2.2) L i^S i£X J we say that a point x\ dominates a point X 2 whenever 9{xi) < 9 {x2) and f{xi) < f{x 2 ). Thus, if iterate Xk dominates iterate Xj^ the latter is of no real interest to us since Xk is at least as good as Xj on account of both feasibility and optimality. All we need to do now is to remember iterates that are not dominated by any other iterates using a structure called a filter. A filter is a list T of pairs of the form (0^, fi) such that either 9i < 9j or fi < fj for i ^ j. Fletcher et al. (1999) propose to accept a new trial iterate Xk + Sk only if it is not dominated by any other iterate in the filter and Xk- In the vocabulary of multi-criteria optimization, this amounts to building elements of the efficient frontier associated with the bi-criteria problem of reducing infeasibility and the objective function value. We may describe this concept by associating with each iterate Xk its (0, /)- pair {9k, fk) and accept Xk + Sk only if its (0, /)-pair does not lie, in 26 the two-dimensional space spanned by constraint violation and objective function value, above and on the right of a previously accepted pair (including that associated with Xk)- While the idea of not accepting dominated trial points is simple and elegant, it needs to be refined a little in order to provide an efficient algorithmic tool. In particular, we do not wish to accept Xk + if its (0, /)-pair is arbitrarily close to that of x^ or that of a point already in the filter. Thus Fletcher et al. (1999) set a small “margin” around the border of the dominated part of the (0, /)-space in which we shall also reject trial points. Formally, we say that a point x is acceptable for the filter if and only if 9{x) < (1 -je)0j or f{x) < fj -^eOj for all {9jjj) G JF, (2.3) for some je G (0, 1). We also say that x is “acceptable for the filter and Xk^ if (2.3) holds with replaced by T U {9k^fk)- We thus consider moving from Xk to x^ + Sk only if Xk + Sk is acceptable for the filter and Xk. As the algorithm progresses, Fletcher et al. (1999) add (0, /)-pairs to the filter. If an iterate Xk is acceptable for this is done by adding the pair [9k^ fk) fo the filter and by removing from it every other pair (0j, fj) such that 9j > 9k and fj — je9j > fk — le^k- We also refer to this operation as “adding Xk to the filter” although, strictly speaking, it is the (0, /)-pair which is added. We conclude this introduction to the notion of a filter by noting that, if a point Xk is in the filter or is acceptable for the filter, then any other point X such that G{x) < (1 - ^e)0k and f{x) < fk~ le&k is also be acceptable for the filter and Xk^ 2,2 The composite SQP step Of course, the step Sk must be computed, typically by solving, possibly approximately, a variant of (2.1). In the trust-region approach, one takes into account the fact that (2.1) only approximates our original problem locally: the step Sk is thus restricted in norm to ensure that Xk + Sk remains in a trust-region centred at where we believe this approximation to be adequate. In other words, we replace QP{xk) by the subproblem TRQP{xk^ Ak) given by minimize mk{xk + 5 ) subject to cs{xk) + As{xk)s = 0, , , ci{xk) + Ai{xk)s >0, IMII ^ ^ki and Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 27 for some (positive) value of the trust-region radius where we have defined mk{xk + s) = fk + {9k, s) + ^{s,Hks), (2.5) and where || • || denotes the Euclidean norm. This latter choice is purely for ease of exposition. We could equally use a family of iteration depen- dent norms || • ||/c, so long as we require that all members of the family are uniformly equivalent to the Euclidean norm. The interested reader may verify that all subsequent developments can be adapted to this more general case by introducing the constants implied by this uniform equivalence wherever needed. Remarkably, most of the existing SQP algorithms assume that an exact local solution of QF{xk) or TRQP(xfc, A^) is found, although at- tempts have been made by Dembo and Tulowitzki (1983) and Murray and Prieto (1995) to design conditions under which an approximate so- lution of the subproblem is acceptable. In contrast, the algorithm of Fletcher et al. (1999) is in spirit to the composite-step SQP methods pi- onneered by Vardi (1985), Byrd, Schnabel and Shultz (1987), and Omo- jokun (1989) and later developed by several authors, including Biegler, Nocedal and Schmid (1995), El-Alem (1995, 1999), Byrd, Gilbert and Nocedal (2000a), Byrd, Hribar and Nocedal (20006), Bielschowsky and Gomes (1998), Liu and Yuan (1998) and Lalee, Nocedal and Plantenga (1998). It decomposes the step into the sum of two distinct com- ponents, a normal step such that Xk + Uk satisfies the constraints of TRQP((Ta;, A)t), and a tangential step whose purpose is to obtain reduction of the objective function’s model while continuing to satisfy those constraints. The step Sk is then called composite. More formally, we write — '^k “ 1 “ ^k (^'^) and assume that ce{xk) + As{xk)nk = 0, cx{xk) + Ax{xk)nk > 0, (2.7) lkit||< A,, (2.8) and ^s{^k) T A^[x]^)s}^ = 0, cx{xk) “f" Ax{x}^)sj^ ^ 0. (2*9) Of course, this is a strong assumption, since in particular (2.7) or (2.8)/ (2.9) may not have a solution. We shall return to this possibility shortly. Given our assumption, there are many ways to compute and For instance, we could compute n^ from Uk = Pk[xk] - Xk, ( 2 . 10 ) 28 where Pk is the orthogonal projector onto the feasible set of QP(o;a:). No specific choice for rik is made, but one instead assumes that exists when the maximum violation of the nonlinear constraints at the A;-th iterate 9^ = 9{xk) is sufficiently small, and that is then reasonably scaled with respect to the values of the constraints. In other words, Fletcher et al. (1999) assume that Uk exists and |ln^|| < whenever 6k < (2.11) for some constants > 0 and Sn > 0. This assumption is also used by Dennis, El-Alem and Maciel (1997) and Dennis and Vicente (1997) in the context of problems with equality constraints only. It can be shown not to impose conditions on the constraints or the normal step itself that are unduly restrictive (see Fletcher et al. (1999) for a discussion). Having defined the normal step, we are in position to use it if it falls within the trust-region, that is if ||nfc|| < In this case, we write Xh = Xk + nh, ( 2 . 12 ) and observe that rik satisfies the constraints of TRQP(rrA;, A)t) and thus also of QP(a;/;;). It is crucial to note, at this stage, that such an Uk may fail to exist because the constraints of QP{xk) may be incompatible, in which case Pk is undefined, or because all feasible points for QP{xk) may lie outside the trust region. Let us continue to consider the case where this problem does not arise, and a normal step rik has been found with \\nk\\ < A)^. We then have to find a tangential step starting from x^ and satisfying (2.8) and (2.9), with the aim of decreasing the value of the objective function. As always in trust-region methods, this is achieved by computing a step that produces a sufficient decrease in m/j, which is to say that we wish rrik{x^) — rrik{xk + Sk) to be “sufficiently large”. Of course, this is only possible if the maximum size of tk is not too small, which is to say that x^ is not too close to the trust-region boundary. We formalize this condition by replacing our condition that ||n^|| < Ak by the stronger requirement that llnfcll < «AAfemin[l,/«^A^*^], (2.13) for some ka G (0,1], some > 0 and some fik G [0,1). If condition (2.13) does not hold, Fletcher et al. (1999) presume that the computation of tk is unlikely to produce a satisfactory decrease in rrik^ and proceed just as if the feasible set of TRQP(a;^, Ajt) were empty. If rik can be computed and (2.13) holds, TRQP(o;a;, A)^) is said to be compatible for 11 . In this case at least a sufficient model decrease seems possible, in the form of a familiar Cauchy-point condition. In order to formalize this Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 29 notion, we recall that the feasible set of QP(rrA;) is convex, and we can therefore introduce the first-order criticality measure Xk = \ min {gk + Hknk,t)\ (2.14) As{xk)t=0 ci{xk)+Ax{xk){nk+t)>0 \\t\\<l (see Conn, Gould, Sartenaer and Toint, 1993). Note that this function is zero if and only if is a first-order critical point of the linearized “tangential” problem minimize {gk + Hkrik, t) + ^ {Hkt, t) subject to As{xk)t = 0 (2.15) Cj{xk) + Ai{xk){nk + i) > 0, which is equivalent to QP(rr^) with s = rik + 1. The sufficient decrease condition then consists in assuming that there exists a constant /^tmd > 0 such that 'mk{xk) - mk{x% + tk) > K.^aXk min 5 (2.16) whenever TRQP(x^, A^) is compatible, where /3^ = 1 + ||^A;||- We know from Toint (1988) and Conn et al. (1993) that this condition holds if the model reduction exceeds that which would be obtained at the generalized Cauchy point, that is the point resulting from a backtracking curvilinear search along the projected gradient path from x^^ that is Xk{a) = Pk[xk - aVxrukixk)]. This technique therefore provides an implementable algorithm for com- puting a step that satisfies (2.16) (see Gould, Hribar and Nocedal, 1998 for an example in the case where c{x) = cs{x)^ or Toint, 1988 and More and Toraldo, 1991 for the case of bound constraints), but, of course, reduction of rrik beyond that imposed by (2.16) is often possible and desirable if fast convergence is sought. Also note that the minimization problem of the right-hand side of (2.14) reduces to a linear program- ming problem if we choose to use a polyhedral norm in its definition at iteration k. If TRQP(rr)t, A)^) is not compatible for /i, that is when the feasible set determined by the constraints of QP(rrjt) is empty, or the freedom left to reduce rrik within the trust region is too small in the sense that (2.13) fails, solving TRQP(rr)^, Aj^) is most likely pointless, and we must consider an alternative. Observe that, if 0{xk) is sufficiently small and 30 the true nonlinear constraints are locally compatible, the linearized con- straints should also be compatible, since they approximate the nonlin- ear constraints (locally) correctly. Furthermore, the feasible region for the linearized constraints should then be close enough to Xk for there to be some room to reduce at least if is large enough. If the nonlinear constraints are locally incompatible, we have to find a neigh- bourhood where this is not the case, since the problem (1.1) does not make sense in the current one. Fletcher et al. (1999) thus rely on a restoration procedure^ whose aim is to produce a new point Xk + r^ for which TRQP(x^ + r^, A^^+i) is compatible for some A^+i > 0 — another condition will actually be needed, which we will discuss shortly. The idea of the restoration procedure is to (approximately) solve min 9(x) (2.17) starting from Xk^ the current iterate. This is a non-smooth problem, but there exist methods, possibly of trust-region type (such as that suggested by Yuan, 1994), which can be successfully applied to solve it. Thus we will not describe the restoration procedure in detail. Note that we have chosen here to reduce the infinity norm of the constraint violation, but we could equally well consider other norms, such as or ^2, in which case the methods of Fletcher and Leyffer (1998) or of El-Hallabi and Tapia (1995) and Dennis, El-Alem and Williamson (1999) can respectively be considered. Of course, this technique only guarantees convergence to a first-order critical point of the chosen measure of constraint violation, which means that, in fact, the restoration procedure may fail as this critical point may not be feasible for the constraints of (1.1). However, even in this case, the result of the procedure is of interest because it typ- ically produces a local minimizer of 0{x)^ or of whatever other measure of constraint violation we choose for the restoration, yielding a point of locally-least infeasibility. There seems to be no easy way to circumvent this drawback, as it is known that finding a feasible point or proving that no such point exists is a global optimization problem and can be as difficult as the optimization problem (1.1) itself. One therefore has to accept two possible outcomes of the restoration procedure: either the procedure fails in that it does not produce a sequence of iterates converging to feasibility, or a point Xk + rk is produced such that 6{xk + rk) is as small as desired. 2.3 An alternative step Is it possible to find a cheaper alternative to computing a normal step, finding a generalized Cauchy point and explicitly checking (2.16)? Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 31 Suppose, for now, that it is possible to compute a point Xk + directly to satisfy the constraints of TRQP(a;fc, Aj^) and for which mfe(xfc) - mk{xk + Sk) > min[7rfc, A^] (2.18) and TTk = where tt is a continuous function of its argument that is a criticality measure for TRQP(a:;^, A^). Such a Sk could for instance be computed by applying any efficient method to this latter problem (we might think of interior point methods of the type described in Conn, Gould, Orban and Toint, 2000) for instance, and its performance could be assessed by computing n{x) = min ||g(a:) - A{xfy\\. y\yi>^ Of course, nothing guarantees that such an exists (depending on our choice of tv{x)) or is cheaply computable for each which means that we may have to resort to the normal-tangential strategy of Fletcher et al. (1999) if such problems arise. However, if we can find at a fraction of the cost of computing rik and can we use it inside an SQP-filter algorithm and maintain the desirable convergence to first-order critical points? Obviously, the answer to that question depends on the manner in which the use of 5^; is integrated into a complete algorithm. 2.4 A hybrid SQP-filter Algorithm We have now discussed the main ingredients of the class of algorithms we wish to consider, and we are now ready to define it formally as Algorithm 2.1: Algorithm 2.1: Hybrid SQP-filter Algorithm Step 0: Initialization. Let an initial point xq , an initial trust- region radius Aq > 0 and an initial symmetric matrix Hq be given, as well as constants 70 < 7i < 1 < 725 0 < 771 < 772 < I5 ye e (0,1), Ke € (0,1), ka € (0,1], > 0, /i e (0,1), tp > 1/(1 + n), > 0 and € (0, Ij. Compute f{xo) and c(ico)- Set JF = 0 and k = 0. 32 Step 1: Test for optimality. and either 0 or tt^ = 0, stop. Step 2: Alternative step. If 9k > K^Ak min[l, A^], (2.19) set = fi and go to Step 3. Otherwise, attempt to compute a step Sk that satisfies the constraints of TRQP{xki ^k) (2.18). If this succeeds, go to Step 4. Otherwise, set fik = 0. Step 3: Composite step. Step 3a: Normal component. Attempt to compute a nor- mal step Uk- If TRQP {xk^ A^) is compatible for go to Step 3b. Otherwise, include Xk in the filter and compute a restoration step rk for which TKQP{xk + r^, is compatible for some Ak-^i > 0, and Xk + rk is acceptable for the filter. If this proves impossible, stop. Otherwise, define Xk-^i = Xk + rk and go to Step 7. Step 3b: Tangential component. Compute a tangential step tk and set Sk = rik + tk- Step 4: Tests to accept the trial step. ■ Evaluate c{xk + Sk) and f{xk + Sk). ■ If Xk + Sk is not acceptable for the filter and Xk-, set Xk-^-i = Xk, choose Ak+i G [ 7 oAjt, 7 iAA;], set nk+i = Uk if Step 3 was executed, and go to Step 7. ■ If mk{xk) - rukixk + Sk) > neOf, ( 2 . 20 ) and ^ def fjxk) - f{xk + Sk) . . rrik{xk) - mk{xk + Sk) again set Xk+i = Xk, choose A^;+i € [ 7 oAfc, 7 iAfc], set = Uk if Step 3 was executed, and go to Step 7. Step 5: Test to include the current iterate in the filter. If (2.20) fails, include Xk in the filter JT. Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 33 Step 6: Move to the new iterate. Set = Xk + and choose Ak^i such that Afc+i e [Ak,'r2^k] if Pk > m and (2.20) holds. Step 7: Update the Hessian approximation. Determine Increment k by one and go to Step 1. This algorithm differs from that of Fletcher et al. (1999) in that it contains the alternative step strategy, but also because it allows the normal step to satisfy (2.13) with ji — 0 whenever (2.19) holds, that is whenever the current iterate is sufficiently feasible. (As we will see later, (2.13) with /i > 0 can be viewed as an implicit technique to impose (2.19).) As in Fletcher and Leyffer (1997) and Fletcher and Leyffer (1998), one may choose ^ = 2 (Note that the choice — 1 is always possible because // > 0). Reasonable values for the constants might then be 70 = 0.1, 7 i = 0.5, 72 = 2, r]i = 0.01, 1 J 2 = 0.9, 70 = 10~^, ka — 0.7, = 100, fj, = 0.01, Kg = 10“'^, and = 0.01. but it is too early to know if these are even close to the best possible choices. As in Fletcher et al. (1999), some comments on this algorithm are now in order. Observe first that, by construction, every iterate x^ must be acceptable for the filter at the beginning of iteration irrespective of the possibility that it is added to the filter later. Also note that the restoration step cannot be zero, that is restoration cannot simply entail enlarging the trust-region radius to ensure (2.13), even if n/^ exists. This is because Xk is added to the filter before is computed, and Xk+Vk must be acceptable for the filter which now contains Xk- Also note that the restoration procedure cannot be applied on two successive iterations, since the iterate Xk + produced by the first of these iterations is both compatible and acceptable for the filter. For the restoration procedure in Step 3a to succeed, we have to eval- uate whether TRQP{xk + r^^Ak-^-i) is compatible for a suitable value of Ak-\-i. This requires that a suitable normal step be computed which successfully passes the test (2.13). Of course, once this is achieved, this normal step may be reused at iteration A: + 1 , if the composite step strategy is used. 34 As it stands, the algorithm is not specific about how to choose during a restoration iteration. On one hand, there is an advantage to choosing a large A^+i, since this allows a large step and one hopes good progress. It also makes it easier to satisfy (2.13). On the other, it may be unwise to choose it to be too large, as this may possibly result in a large number of unsuccessful iterations, during which the radius is reduced, before the algorithm can make any progress. A possible choice might be to restart from the radius obtained during the restoration iteration itself, if it uses a trust-region method. Reasonable alternatives would be to use the average radius observed during past successful iterations, or to apply the internal doubling strategy of Byrd et al. (1987) to increase the new radius, or even to consider the technique described by Sartenaer (1997). However, we recognize that extensive numerical experience will remain the ultimate measure of any suggestion at this level. The role of condition (2.20) may be interpreted as follows. If this condition fails, then one may think that the constraint violation is sig- nificant and that one should aim to improve on this situation in the future, by inserting the current point in the filter. Fletcher and Leyffer (1997) use the term of “0-step” in such circumstances, to indicate that the main preoccupation is to improve feasibility . On the other hand, if condition (2.20) holds, then the reduction in the objective function predicted by the model is more significant than the current constraint violation and it is thus appealing to let the algorithm behave as if it were unconstrained. Fletcher and Leyffer (1997) use the term of “/-step” to denote the step generated, in order to reflect the dominant role of the objective function / in this case. In this case, it is important that the predicted decrease in the model is realized by the actual decrease in the function, which is why we also require that (2.21) does not hold. In particular, if the iterate Xk is feasible, then (2.19) and (2.11) imply that Xk = x^ and we obtain that Ke^t = 0 < mk{x%) - mk{xk + Sfc) = rrik{xk) - mk{xk + Sk). (2.22) As a consequence, the filter mechanism is irrelevant if all iterates are feasible, and the algorithm reduces to a classical unconstrained trust- region method. Another consequence of (2.22) is that no feasible iterate is ever included in the filter^ which is crucial in allowing finite termina- tion of the restoration procedure. Indeed, if the restoration procedure is required at iteration k of the filter algorithm and produces a sequence of points {xk^j} converging to feasibility, there must be an iterate for which (1 - 7e)6'r", — Afc+1 min[l, J Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm for any given > 0, where 35 = min 9i > 0 ^ iez, i<k and Z = {k \ Xk is added to the filter}. Moreover, must eventually be small enough to ensure, using our as- sumption on the normal step, the existence of a normal step from Xkj- In other words, the restoration iteration must eventually find an it- erate Xkj which is acceptable for the filter and for which the normal step exists and satisfies (2.13), i.e. an iterate xj which is both acceptable and compatible. As a consequence, the restoration procedure will terminate in a finite number of steps, and the filter algorithm may then proceed. Note that the restoration step may not terminate in a finite number of iterations if we do not assume the existence of the normal step when the constraint violation is small enough, even if this violation converges to zero (see Fletcher, Leyffer and Toint, 1998, for an example). Notice also that (2.20) ensures that the denominator of pk in (2.21) will be strictly positive whenever 9^ is. If 9k = 0, then Xj^ = x^^ and the denominator of (2.21) will be strictly positive unless Xk is a first-order critical point because of (2.16). The attentive reader will have observed that we have defined rik+i in Step 4 in the cases where iteration k is unsuccessful (just before branch- ing back to Step 2), while we may not use it if the alternative step of Step 2 is then used at iteration k + 1. This is to keep the expression of the algorithm as general as possible: a more restrictive version would impose a branch back to Step 3b from Step 4 if iteration k is unsuc- cessful, but this would then prevent the use of an alternative step at iteration k + 1. We have chosen not to impose that restriction, but we obviously require that is used in Step 3a whenever it has been set at iteration A:, instead of recomputing it from scratch. Finally, note that Step 6 allows a relatively wide choice of the new trust-region radius While the stated conditions are sufficient for the theory developed below, one must obviously be more specific in practice. For instance, one may wish to distinguish, at this point in the algorithm, the cases where (2.20) fails or holds. If (2.20) holds, the main effect of the current iteration is not to reduce the model (which makes the value of pk essentially irrelevant), but rather to reduce the constraint violation (which is taken care of by inserting the current iterate in the filter at Step 5). In this case. Step 6 imposes no further restriction on A/c+i- In practice, it may be reasonable not to reduce the trust-region radius, because this might cause too small steps towards feasibility or an 36 unnecessary restoration phase. However, there is no compelling reason to increase the radius either, given the compatibility of TKQF{xk^^k)‘ A reasonable strategy might then be to choose = A^. If, on the other hand, (2.20) holds, the emphasis of the iteration is then on reducing the objective function, a case akin to unconstrained minimization. Thus a more detailed rule of the type A a f [7iAfc,72Afc] if pk e I [Afc,72Afc] if Pk > V2 seems reasonable in these circumstances. 3. Convergence to First-Order Critical Points We now prove that our hybrid algorithm generates a globally con- vergent sequence of iterates, at least if the restoration iteration always succeeds. For the purpose of our analysis, we shall consider S = {k\ Xk+i = Xk + Sk}, the set of (indices of) successful iterations, { k I Step 3 is executed and either TRQP(a;)fc, Aj^) has no feasible point or llrifcll > KAAfcmin[l,K^A^] the set of restoration iterations, A — {k \ the alternative step is used at iteration /j}, and C — \^k I Sk — Tik “h tk\ •) the set of iterations where a composite step is used (with /i/c > 0). Note that (2.19) implies that 9k < /i:uAfcmin[l, A^] < (3.1) for every k E A. Also note that {1, 2, . . .} = U7^ and that TZ C Z. In order to obtain our global convergence result, we will use the as- sumptions ASl: / and the constraint functions cs and cj are twice continuously differentiable; AS2: there exists > 1 such that \\Hk\\ < «umh - 1 < «;umh for all k, Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 37 ASS: the iterates {x^} remain in a closed, bounded domain X C IR^. If, for example, is chosen as the Hessian of the Lagrangian function ^{x,y) = f{x) + {ys,ce{x)) + {yx,cj{x)) at Xk, in that Hk = V XX f{xk)+ Y. xx^iiA^k\ (^*2) ieSiJX where [yk]i denotes the i-th component of the vector of Lagrange mul- tipliers 2 /^ = [Ue k yxk)'> ^hen we see from ASl and ASS that AS2 is satisfied when these multipliers remain bounded. The same is true if the Hessian matrices in (3.2) are replaced by bounded approximations. A first immediate consequence of AS1-AS3 is that there exists a con- stant /i:^bh > 1 such that, for all fc, \f{p^k “b — rrik{Xk + ^k)\ ^ f^nhh^k' (^*^) A proof of this property, based on Taylor expansion, may be found, for instance, in Toint (1988). A second important consequence of our assumptions is that ASl and AS3 together directly ensure that, for all A;, < f{xk) < and Q<9k< 0^^^ (3.4) for some constants and > 0. Thus the part of the (0, /)- space in which the (0, /)-pairs associated with the filter iterates lie is restricted to the rectangle Mo - [0,9--] X whose area, surf(A4o), is clearly finite. We also note the following simple consequence of (2.11) and AS3. Lemma 1 Suppose that Algorithm 2.1 is applied to problem (1.1). Suppose also that (2.11) and AS3 hold, that k e C, and that 9k Then there exists a constant > 0 independent of k such that ^isA < ||^A:||- (3.5) 38 Proof. Since k E we first obtain that exists (as a consequence of (2.11)), and define Vfc {j e£ \ 9k = \cj{xk)\) [j{j ^^\9k = -Cj{xk)]., that is the subset of most-violated constraints. Prom the definitions of 9k in (2.2) and of the normal step in (2.7) we obtain, using the Cauchy-Schwartz inequality, that 9k ^ ^ (^/c) II ll^/cll (^*6) for all j E Vk- But ASS ensures that there exists a constant > 0 such that max max ||Va;C.-(a;)|| — . We then obtain the desired conclusion by substituting this bound in (3.6). □ Our assumptions and the definition of Xk in (2-14) ensure that 9k and Xk can be used (together) to measure criticality for problem (1.1). Lemma 2 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl and ASS hold, and that there exists a subsequence {ki} % TZ such that lim 9ki = 0, lim Xh = 0 and lim tt/,. = 0. (3.7) i — >-00 i— )-oo 1-400 kiec kieA Then every limit point of the sequence {xk^} is a first-order critical point for problem (1.1). Proof. Consider a limit point of the sequence {xk^^}-, whose existence is ensured by ASS, and assume that {ki} C {ki} is the index set of a subsequence such that {xk^} converges to x^. If {k^} contains infinitely many indices of the definition of tt^ implies that x^ is a first-order critical point for problem (1.1). If this is not the case, the fact that ki ^TZ implies that Uk^ satisfies (2.11) for sufficiently large i and converges to zero, because {9k^} converges to zero and the second part of this condition. As a consequence, we deduce from (2.12) that also converges to x^. Since the minimization problem occuring in the definition of Xki (in (2-14)) is convex, we then obtain from Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 39 classical perturbation theory (see, for instance, Fiacco, 1983, pp. 14- 17), ASl and the first part of (3.7) that I min {g^,t)\ = 0. As{xAt=0 cx(x:,)+Ax(xAt>0 m\<i This in turn guarantees that is first-order critical for problem (1.1). □ We start our analysis by examining what happens when an infinite num- ber of iterates (that is, their (0, /)-pairs) are added to the filter. Lemma 3 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl and AS3 hold and that \Z\ = oo. Then lim 6k = 0. k—¥oo kez Proof. Suppose, for the purpose of obtaining a contradiction, that there exists an infinite subsequence {ki} C Z such that 6k, > e (3.8) for all i and for some e > 0. At each iteration the (0, /)-pair associated with Xk-^ that is (dki^fki)^ is added to the filter. This means that no other (0, /)-pair can be added to the filter at a later stage within the square [Oki -79€,eki] X [fk^ -'re^Jki], or with the intersection of this square with Mq. But the area of each of these squares is 7|e^. Thus the set M.q is completely covered by at most a finite number of such squares. This puts a finite upper bound on the number of iterations in {%}, and the conclusion follows. □ We next examine the size of the constraint violation before and after a ‘‘composite iteration” where restoration did not occur. 40 Lemma 4 Suppose that Algorithm 2.1 is applied to problem (1.1). Suppose also that ASl and ASS hold and that satisfies (3.5) for k ^ C. Then there exists a constant > 0 such that 9k < (3.9) and 9{xk + Sk) < KubtAfc. (3.10) for all 1 ^ 7^. Proof. Assume first that k E C with fik = 1^- Since k ^ TZ, we have from (3.5) and (2.13) that (3.11) which gives (3.9). On the other hand, (3.1) implies that an inequality of the form (3.9) holds for A; G v4. or A; G C with /i^ = 0. Now, for any A;, the z-th constraint function at Xk + can be expressed as ^i{^k ^k) ~ ^i{^k) “h ^k^k) “t" 2 for i G UX, where we have used ASl, the mean- value theorem, and where belongs to the segment [xj^^Xk + Sk]- Using ASS, we may bound the Hessian of the constraint functions and we obtain from (2.9), the Cauchy-Schwartz inequality, and (2.8) we have that \ci{xk + Sk)\< ^ max \\VxxCi (^)ll <kiA|, if z G or -Ci{xk + Sk) < imax||Va;a;Ci(a;)|| \\sk\f < kiA|, xex if z G X, where we have defined This gives the desired bound for any ^ubt > max[Ki,«:„,/«AK^i/Kisc]. □ Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 41 We next assess the model decrease when the trust-region radius is suffi- ciently small. Lemma 5 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl- AS3 and (2.16) hold, that /j E C, that, for some e > 0, Xk > e. (3.12) Suppose furthermore that Ai. < min 1, e (2. ^umh V ^umh (M n — def (3.13) where = max^^x \\'^xf{x)\\. Then rrik{xk) - mk{xk + Sk) > ^K.^^eAk- This last inequality also holds if A; € ^, if (3.13) holds and TTk > e. (3-14) Proof. Assume first that k € C. We note that, by (2.16), AS2, (3.12) and (3.13), ’mkix'k) - rrikixk + Sfe) > KtmdXfc Xk ) ^k > «,„d€Afc. (3.15) Now ixikixk) = rukixk) + {gk,nk) + \{nk,Hknk) and therefore, using the Cauchy-Schwartz inequality, AS2, (2.13) and (3.13) that \rrik{xk) - mk{xl)\ < ||riA:|| Ibfcll + ||nfc||^ ^ ^ubgll^fcll + l^umhll^fclP < iK,„aeAfc. We thus conclude from this last inequality and (3.15) that the desired conclusion holds for A: G C. If we now assume that k ^ A (that 42 is iteration k uses an alternative step), then (2.18), (3.13) and the inequality > 1 directly yields that rrikixk) - rukixk + Sk) > Kt^d min[e, A^] > \n,^A^k as desired. □ We continue our analysis by showing, as the reader has grown to expect, that iterations have to be very successful when the trust-region radius is sufficiently small. Lemma 6 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl- AS3, (2.16) and (3.12) hold, that k and that A/c < min (1 - r?2)«tmde (3.16) Then Pk > m- Proof. Using (2.21), (3.3), Lemma 5 and (3.16), we find that I _i\ = \fi^k + Sk) -mk{xk + Sk)\ ^ KubhA| ^ ^ _ \mk{xk) - mk{xk + Sk)\ ~ kn.^^eAk ~ from which the conclusion immediately follows. □ Note that this proof could easily be extended if the definition of pk in (2.21) were altered to be of the form djf f{Xk)~ f{Xk+Sk)+@k .g mk{xk) - rukixk + Sk) provided &k is bounded above by a multiple of A|. We will comment in Section 4 why such a modification might be of interest. Now, we also show that the test (2.20) will always be satisfied when the trust-region radius is sufficiently small. Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 43 Lemma 7 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl- AS3, (2.16) and (3.12) hold, that k ^ TZ^ that Uk satisfies (3.5) if k £ and that A/c < min ^7715 2iKq Hvukt 1 V'(H-m)-! (3.18) Then mk{xk) - mk{xk + Sk) > KgOf. Proof. This directly results from the inequalities ngOt < < mk{xk) - mk{xk + Sfc), where we successively used Lemma 4, (3.18) and Lemma 5. □ We may also guarantee a decrease in the objective function, large enough to ensure that the trial point is acceptable with respect to the (0, /)-pair associated with so long as the constraint violation is itself sufficiently small. Lemma 8 Suppose that Algorithm 2.1 is applied to problem (1.1) and that ffnite termination does not occur. Suppose also that ASl- AS3, (2.16), (3.12) and (3.16) hold, that k ^ TZ^ that satisfies (3.5) if A; G C, and that Then Ok < kJ " = So- (3.19) f{xk + Sk) < f(xk) - JgOk. Proof. Applying Lemmas 4-6 — which is possible because of (3.12), (3.16), k ^ TZ and Uk satisfies (3.5) for k E C — and (3.19), we obtain 44 that f{xk) - f{xk + Sh) > m[mk{xk) - mkixk + Sk)] > |r?2K.„de > and the desired inequality follows. □ We now establish that if the trust-region radius and the constraint vi- olation are both small at a non-critical iterate TRQP(rrj^, Aj^) must be compatible. Lemma 9 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose that AS1-AS3, (2.11), and (3.12) hold, that (2.16) holds for k and that A/c < min 70 1-H Suppose furthermore that (3.20) 9k < min[J<9,^yi]. Then k ^TZ. (3.21) Proof. If an alternative step is used at iteration A:, then k ^ TZ. Assume therefore that k ^ A. Because 9^ < 6n^ we know from (2.11) and Lemma 1 that (2.11) and (3.5) hold. Moreover, since 9^ < we have that (3.19) also holds. Assume, for the purpose of deriving a contradiction, that k E TZ^ which implies that llnfcll > (3.22) where we have used (2.13) and the fact that ac^zA^^ < ac/zA^ < 1 because of (3.20). In this case, the mechanism of the algorithm then ensures that k — 1 ^TZ. Now assume that iteration A: — 1 is unsuccess- ful. Because of Lemmas 6 and 8, which hold at iteration k — 1 ^ TZ because of (3.20), the fact that 9^ = 9k-i^ (2.11), and (3.19), we obtain that Pk-i > m and f{xk-i + Sk-i) < f{xk-i) - Je^k-i- Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 45 Hence, given that Xk-i is acceptable for the filter at the beginning of iteration A: — 1, if this iteration is unsuccessful, it must be because 9{xk-i + Sfc-i) > (1 - jeWk-i = (1 - jeWk- But Lemma 4 and the mechanism of the algorithm then imply that (1 - 'feWk < 7o Combining this last bound with (3.22) and (2.11), we deduce that < linfcll < < ~Y^Al and hence that J^i-n ^ 7o(l ^usc ^ubt Since this last inequality contradicts (3.20), our assumption that it- eration A; — 1 is unsuccessful must be false. Thus iteration A; — 1 is successful and 6^ = 9{xi^^i + 5a;-i). We then obtain from (3.22), (2.11) and (3.10) that K^KfiAl'^^ < llnfell < < «usc«„b.A|_i < 7o which is again impossible because of (3.20) and because (1 — je) < 1* Hence our initial assumption (3.22) must be false, which yields the desired conclusion. □ We now distinguish two mutually exclusive cases. For the first, we con- sider what happens if there is an infinite subsequence of iterates belong- ing to the filter. Lemma 10 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl- AS3 and (2.11) hold and that (2.16) holds for k ^ TZ. Suppose furthermore that \Z\ = oo. Then there exists an infinite subsequence {kj} C Z such that lim 9k, = 0 (3.23) j^oo ^ lim Xki = 0 and lim tt*,, = 0. i— >oo j—^oo kjEiC kj^>A and (3.24) 46 Proof. Let {ki} be any infinite subsequence of Z. We observe that (3.23) follows from Lemma 3. Suppose now that, for some C2 > 0. Xki > (3.25) for all i such that ki E C and TTfc. > €2 (3.26) for all i such that ki G A. Suppose furthermore that there exists 63 > 0 such that, for all i > z'o, Ak, > 63. (3.27) If ki ^ wA, (3.23) and (2.11) ensure that n^. exists for i > say, and also that lim IlnfcJI = 0. (3.28) l-^OO Thus (3.27) ensures that (2.13) holds for sufficiently large i and ki 0 TZ. We may then decompose the model decrease in its normal and tangential components, that is rriki (xk) -mki {xk^ +Sk) = TUk^ (a;fc.) -ruki {xl. ) +mki (x^) - ruk^ (xk^ +Sk^ . (3.29) Consider the normal component first. As we noted in the proof of Lemma 5, \mki{xki) - mkiixl.)\ < Kubglln/fcJI + which in turn, with (3.28), yields that lim [mfc. {xki ) - TUki )] = 0. (3.30) If we now consider the normal component, (3.25), (3.27) (2.16) and AS2 yield that mkiixki) - rrikiixki + Ski) > «tmde 2 min -^,€3 > 0. (3.31) L^umh Substituting (3.30) and (3.31) into (3.29), we find that, for ki G C, f^kiixki) - rrikiixki + Ski) > 61 > 0 - If, on the other hand, ki € A, then (3.26), (3.27) and (2.18) give that mkiixki) - rukiixki + Ski) > min[e2, €3] S2 > 0. Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm Thus 47 lim inf[mfc. (rcfc. ) - mti (xki + Sfc; )] > min[^i , ^ 2 ] = = 5 > 0. (3.32) We now observe that, because is added to the filter at iteration the mechanism of the algorithm imposes that either iteration ki E TZ or (2.20) must fail. Since we already verified that ki ^ IZ for i > sufficiently large, we obtain that (2.20) must fail for such z, that is mkiixki) - mkiixk- + Ski) < (3.33) Combining this bound with (3.32), we find that is bounded away from zero for i sufficiently large, which is impossible in view of (3.23). We therefore deduce that (3.27) cannot hold and obtain that there is a subsequence {ki] C {ki} for which lim = 0. i^oo We now restrict our attention to the tail of this subsequence, that is to the set of indices ki that are large enough to ensure that (3.18), (3.19) and (3.20) hold, which is possible by definition of the subsequence and because of (3.23). For these indices, we may therefore apply Lemma 9, and deduce that iteration k^ ^ 7Z for £ sufficiently large. Hence, as above, (3.33) must hold for £ sufficiently large. However, we may also apply Lemma 7, which contradicts (3.33), and therefore (3.25) and (3.26) cannot hold together, yielding the desired result. □ Thus, if an infinite subsequence of iterates is added to the filter. Lemma 2 ensures that it converges to a first-order critical point. Our remaining analysis then naturally concentrates on the possibility that there may be no such infinite subsequence. In this case, no further iterates are added to the filter for k sufficiently large. In particular, this means that the number of restoration iterations, \7Z\^ must be finite. In what follows, we assume that /jq > 0 is the last iteration for which x^^-i is added to the filter. 48 Lemma 11 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl- AS3 and (2.11) hold and that (2.16) holds for k ^TZ. Then we have that lim 9k - 0. (3.34) /c->oo Furthermore, exists and satisfies (3.5) for all k > ko sufficiently large. Proof. Consider any successful iterate with k > ko. Then we have that fi^k) - f{xk+i) > mirrikixk) - mkixk + Sk)] > > 0. (3.35) Thus the objective function does not increase for all successful iter- ations with k > ko- But ASl and AS3 imply (3.4) and therefore we must have, from the first part of this statement, that lim f{xk) - f{xk+i) = 0. (3.36) kes k^oo (3.34) then immediately follows from (3.35) and the fact that 9j = 9k for all unsuccessful iterations j that immediately follow the successful iteration /c, if any. The last conclusion then results from (2.11) and Lemma 1. □ We now show that the trust-region radius cannot become arbitrarily small if the (asymptotically feasible) iterates stay away from first-order critical points. Lemma 12 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl — AS3 hold and that (2.16) holds for k ^TZ. Suppose furthermore that (3.12) hold for all k > ko- Then there exists a Amin > 0 such that ^ Amin for all k. Proof. Suppose that k\ > ko is chosen sufficiently large to ensure that (3.21) holds and thus that (2.11) also holds for all k > /ji, which Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 49 is possible because of Lemma 11. Suppose also, for the purpose of ob- taining a contradiction, that iteration j is the first iteration following iteration k\ for which Aj < 7o min 6 where 0 = mm Oi i^Z is the smallest constraint violation appearing in the filter. Note also that the inequality Aj < 7 oA/c^, which is implied by (3.37), ensures that j > ki~\- I and hence that j — I > k\ and thus that j — 1 ^ TZ. Then the mechanism of the algorithm and (3.37) imply that Aj_i < —A,- < Ss (3.38) To and Lemma 6, which is applicable because (3.37) and (3.38) together imply (3.16) with k replaced by j — 1, then ensures that Pj-i > m- (3-39) Furthermore, since n — j — 1 satisfies (2.11), Lemma 1 implies that we can apply Lemma 4. This together with (3.37) and (3.38), gives that 0{xj-i + Sj-i) < < (1 - (3.40) We may also apply Lemma 8 because (3.37) and (3.38) ensure that (3.16) holds and because (3.19) also holds for j — 1 > k\. Hence we deduce that f{xj-i + Sj-i) < f{xj-i) - 70%-i- This last relation and (3.40) ensure that xj-i + sj-i is acceptable for the filter and xj-i. Combining this conclusion with (3.39) and the mechanism of the algorithm, we obtain that > Aj-i. As a consequence, and since (2.20) also holds at iteration j — 1, iteration j cannot be the first iteration following k\ for which (3.37) holds. This contradiction shows that A/^ > 70^5 for all k > k\^ and the desired result follows if we define Amin = min[Ao, . . .,Ak^,^o5s]. □ 50 We may now analyze the convergence of Xk itself. Lemma 13 Suppose that Algorithm 2.1 is applied to problem (1.1) and that finite termination does not occur. Suppose also that ASl- AS3, (2.11) hold, and that (2.16) holds for k ^TZ. Then there exists a subsequence {kj} such that liminfx)^. —0 and limmink- == 0. (3-41) i-)-oo ^ j-^OO ^ kjec kjeA Proof. We start by observing that Lemma 11 implies that the second conclusion of (2.11) holds for k sufficiently large. Moreover, as in Lemma 11, we obtain (3.35) and therefore (3.36) for each A; G k > ko. Suppose now, for the purpose of obtaining a contradiction, that (3.12) and (3.14) hold. Assume first that k E C. In this case, and notice that mk{xk) - mk{xk + Sk) = mk{xk) - mk{xl) + mk{xl) - mk{xk + Sfc). (3.42) Moreover, note, as in Lemma 5, that \mk{xk) - mk{x%)\ < + «umh||nA;|P, which in turn yields that lim [rrikixk) - mk{x^)] = 0 AC— >-00 because of Lemma 11 and the second conclusion of (2.11). This limit, together with (3.35), (3.36) and (3.42), then gives that lim [mA;(a;^) - mk{xk + Sk)] = 0. (3.43) k-^oo kes But (2.16), (3.12), AS2 and Lemma 12 together imply that, for all k > ko mkixl)-mk{xk+Sk) > ^tmd Xk min > > K.„aemin ‘ 6 ■ 7 ^min ipk J . ^umh (3.44) immediately giving a contradiction with (3.43). On the other hand, if A; G w4, then (3.14) and (2.18) immediately imply that 5 mk{xk) - rukixk + Sk) > «tmd min[e, Amin] > 0, Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 51 which, together with ( 2 . 21 ) and the fact that k e contradicts the boundedness of /. Hence (3.12) and (3.14) cannot hold together and the desired result follows. □ We may summarize all of the above in our main global convergence result. Theorem 14 Suppose that Algorithm 2.1 is applied to prob- lem ( 1 . 1 ) and that finite termination does not occur. Suppose also that AS1-AS3 and ( 2 . 11 ) hold, and that (2.16) holds for k ^TZ. Let {xk} be the sequence of iterates produced by the algorithm. Then either the restoration procedure terminates unsuccessfully by con- verging to an infeasible first-order critical point of problem (2.17), or there is a subsequence {kj} for which lim Xk- = x^ >oo and x^ is a first-order critical point for problem ( 1 . 1 ). Proof. Suppose that the restoration iteration always terminates successfully. From AS3, Lemmas 10, 11 and 13, we obtain that, for some subsequence {fcj}, lim dk. = lim Xkj = lim 7 rjt = 0. (3.45) >00 j^oo j-^oo kj^C The conclusion then follows from Lemma 2. □ Can we dispense with AS3 to obtain this result? Firstly, this assump- tion ensures that the objective and constraint functions remain bounded above and below (see (3.4)). This is crucial for the rest of the analy- sis because the convergence of the iterates to feasibility depends on the fact that the area of the filter is finite. Thus, if AS3 does not hold, we have to verify that (3.4) holds for other reasons. The second part of this statement may be ensured quite simply by initializing the filter to ( 0 max^ _oc), for some in Step 0 of the algorithm. This has the effect of putting an upper bound on the infeasibility of all iterates, which may be useful in practice. However, this does not prevent the objective function from being unbounded below in 52 and we cannot exclude the possibility that a sequence of infeasible iter- ates might both continue to improve the value of the objective function and satisfy (2.20). If is bounded, ASS is most certainly satisfied. If this is not the case, we could assume that < f{x) < and 0 < 6{x) < for x G C{9^^^) (3.46) for some values of and and simply monitor that the values f{xk) are reasonable — in view of the problem being solved — as the algorithm pro- ceeds. To summarize, we may replace ASl and ASS by the following assumption. AS4: The functions / and c are twice continuously differentiable on an open set containing C(0"^^^), their first and second derivatives are uniformly bounded on and (3.46) holds. The reader should note that AS4 no longer ensures the existence of a limit point, but only that (3.45) holds for some subsequence {kj}. Furthermore, the comments following the statement of (2.11) no longer apply if limit points at infinity are allowed. 4. Conclusion and Perspectives We have introduced a hybrid trust-region SQP-filter algorithm for general nonlinear programming, that mixes composite steps with poten- tially cheaper alternative steps, and we have shown this algorithm to be globally convergent to first-order critical points. This hybrid algorithm has the potential of being numerically more efficient than its version that only uses composite steps, as analyzed in Fletcher et al. (1999). How- ever, the authors are well aware that this potential must be confirmed by numerical experiments. References L. T. Biegler, J. Nocedal, and C. Schmid. A reduced Hessian method for large-scale constrained optimization. SIAM Journal on Optimization^ 5(2), 314-347, 1995. R. H. Bielschowsky and F. A. M. Gomes. Dynamical control of infeasibility in nonlin- early constrained optimization. Presentation at the Optimization 98 Conference, Coimbra, 1998. R. H. Byrd, J. Ch. Gilbert, and J. Nocedal. A trust region method based on inte- rior point techniques for nonlinear programming. Mathematical Programming, Series A, 89(1), 149-186, 2000a. R. H. Byrd, M. E. Hribar, and J. Nocedal. An interior point algorithm for large scale nonlinear programming. SIAM Journal on Optimization, 9(4), 877-900, 20006. Global Convergence of a Hybrid Trust-Region SQP-Filter Algorithm 53 R. H. Byrd, R. B. Schnabel, and G. A. Shultz. A trust region algorithm for nonlinearly constrained optimization. SIAM Journal on Numerical Analysis, 24, 1152-1170, 1987. A. R. Conn, N. I. M. Gould, D. Orban, and Ph. L. Toint. A primal-dual trust-region algorithm for minimizing a non-convex function subject to bound and linear equality constraints. Mathematical Programming, 87(2), 215-249, 2000. A. R. Conn, N. I. M. Gould, A. Sartenaer, and Ph. L. Toint. Global convergence of a class of trust region algorithms for optimization using inexact projections on convex constraints. SIAM Journal on Optimization, 3(1), 164-221, 1993. R. S. Dembo and U. Tulowitzki. On the minimization of quadratic functions subject to box constraints. School of Organization and Management Working paper Series B no. 71, Yale University, Yale, USA, 1983. J. E. Dennis and L. N. Vicente. On the convergence theory of trust-region based algo- rithms for equality-constrained optimization. SIAM Journal on Optimization, 7(4), 927-950, 1997. J. E. Dennis, M. El-Alem, and M. C. Maciel. A global convergence theory for gen- eral trust-region based algorithms for equality constrained optimization. SIAM Journal on Optimization, 7(1), 177-207, 1997. J. E. Dennis, M. El-Alem, and K. A. Williamson. A trust-region approach to non- linear systems of equalities and inequalities. SIAM Journal on Optimization, 9(2), 291-315, 1999. M. El-Alem. Global convergence without the assumption of linear independence for a trust-region algorithm for constrained optimization. Journal of Optimization Theory and Applications, 87(3), 563-577, 1995. M. El-Alem. A global convergence theory for a general class of trust-region-based algorithms for constrained optimization without assuming regularity. SIAM Journal on Optimization, 9(4), 965-990, 1999. M. El-Hallabi and R. A. Tapia. An inexact trust-region feasible-point algorithm for nonlinear systems of equalities and inequalities. Technical Report TR95- 09, Department of Computational and Applied Mathematics, Rice University, Houston, Texas, USA, 1995. A. V. Fiacco. Introduction to sensitivity and stability analysis in nonlinear program- ming. Academic Press, London, 1983. R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Nu- merical Analysis Report NA/171, Department of Mathematics, University of Dundee, Dundee, Scotland, 1997. R. Fletcher and S. Leyffer. User manual for filterSQP. Numerical Analysis Report NA/181, Department of Mathematics, University of Dundee, Dundee, Scotland, 1998. R. Fletcher, N. I. M. Gould, S. Leyffer, and Ph. L. Toint. Global convergence of trust-region SQP-filter algorithms for nonlinear programming. Technical Re- port 99/03, Department of Mathematics, University of Namur, Namur, Belgium, 1999. R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of an SLP-filter algorithm. Technical Report 98/13, Department of Mathematics, University of Namur, Namur, Belgium, 1998. 54 R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of an SQP-filter algorithm. Technical Report 00/??, Department of Mathematics, University of Namur, Namur, Belgium, 2000. N. I. M. Gould, M. E. Hribar, and J. Nocedal. On the solution of equality constrained quadratic problems arising in optimization. Technical Report RAL-TR-98-069, Rutherford Appleton Laboratory, Chilton, Oxfordshire, England, 1998. M. Lalee, J. Nocedal, and T. D. Plantenga. On the implementation of an algorithm for large-scale equality constrained optimization. SIAM Journal on Optimization^ 8(3), 682-706, 1998. X. Liu and Y. Yuan. A robust trust-region algorithm for solving general nonlinear pro- gramming problems. Presentation at the International Conference on Nonlinear Programming and Variational Inequalities, Hong Kong, 1998. J. J. More and G. Toraldo. On the solution of large quadratic programming problems with bound constraints. SIAM Journal on Optimization, 1(1), 93-113, 1991. W. Murray and F. J. Prieto. A sequential quadratic programming algorithm using an incomplete solution of the subproblem. SIAM Journal on Optimization, 5(3), 590-640, 1995. E. O. Omojokun. Trust region algorithms for optimization with nonlinear equality and inequality constraints. PhD thesis. University of Colorado, Boulder, Colorado, USA, 1989. A. Sartenaer. Automatic determination of an initial trust region in nonlinear pro- gramming. SIAM Journal on Scientific Computing, 18(6), 1788-1803, 1997. Ph. L. Toint. Global convergence of a class of trust region methods for nonconvex minimization in Hilbert space. IMA Journal of Numerical Analysis, 8(2), 231- 252, 1988. A. Vardi. A trust region algorithm for equality constrained minimization: conver- gence properties and implementation. SIAM Journal on Numerical Analysis, 22(3), 575-591, 1985. Y. Yuan. Trust region algorithms for nonlinear programming, in Z. C. Shi, ed., ‘Contemporary Mathematics’, Vol. 163, pp. 205-225, Providence, Rhode-Island, USA, 1994. American Mathematical Society. SOME ASPECTS OF NONLINEAR SEMIDEFINITE PROGRAMMING Florian Jarre Institut fiir Mathematik Universitdt Diisseldorf Universitdtsstrafie 1 D-40225 Diisseldorf, Germany jarre@opt.uni-duesseldorf.de Abstract This paper is an extended abstract of a survey talk given at the IFIP TC7 Conference in Trier, July 2001. We consider linear and nonlinear semidefinite programming problems and concentrate on selected aspects that are relevant for understanding dual barrier methods. The paper is aimed at graduate students to highlight some issues regarding smooth- ness, regularity, and computational complexity without going into de- tails. Keywords: Semidefinite programming, smoothness, regularity, interior method, lo- cal minimization. 1. Introduction In this paper we consider nonlinear semidefinite programming prob- lems (NLSDP’s) and concentrate on some aspects relevant to a dual barrier method. Other approaches for solving NLSDP’s are the program package LOQO of Vanderbei (1997) based on a primal-dual approach, or recent work of Vanderbei et.al. (2000). Also the work of Kocvara and Stingl (2001) solving large scale semidefinite programs based on a modified barrier approach seems very promising. The modified barrier approach does not require the barrier parameter to converge to zero and may thus overcome some of the problems related to ill-conditioning in traditional interior methods. Optimality conditions for NLSDP’s are considered in Forsgren (2000); Shapiro and Scheinberg (2000). Some problems considered in this paper do not satisfy any constraint qualification. For such problems primal-dual methods do not appear to be suitable. Another question addressed in this paper is the question 55 56 of how to avoid “poor” local minimizers, a question that may be even more difficult to investigate for primal dual methods than it is for barrier methods. 1.1 Notation The following notation has become standard in the literature on linear semidefinite programs. The space of symmetric n x n- matrices is denoted by The inequality X ^ 0, {X yO) is used to indicate that X is a symmetric positive semidefinite (positive definite) n x n-matrix. By {C,X) = C ^ trace(C^X) = ^ CijXij hj we denote the standard scalar product on the space of n x n-matrices inducing the Frobenius norm, x.x = \\x\\l. For given symmetric matrices we define a linear map A from <S” to by / • X \ ^(x) = : .xy The adjoint operator A* satisfying (X(y),X) = y^A{X) yXeS^,y€lR^ is given by m A*{y) = J2yiA^^K i=l 2. Linear semidefinite programs In this section we consider a pair of primal and dual (linear) semidef- inite programs in standard form, (P) minimize C • X s.t. A(X) = b, X ^ 0 and (D) maximize b^y s.t. A*{y) + S = C, S y 0. Nonlinear Semidefinite Programming 57 When using the notion “semidefinite program” (SDP) we always refer to a linear semidefinite program; nonlinear SDP’s will be denoted by NLSDP. The data for (P) and (P) are a linear map a vector b G and a matrix C ^ S'^. We use the convention that the infimum of (P) is +OC whenever (P) does not have any feasible solution X, and the supremum of (P) is — oc if the feasible set of (P) is empty. If there exists a matrix X y 0 (not just X ^ 0) that is feasible for (P) then we call X “strictly” feasible for (P) and say that (P) satisfies Slaters condition. Likewise, if there exists a matrix S y 0 that is feasible for (P) we call (P) strictly feasible. If Slaters condition is satisfied by (P) or by (P) then the optimal values of (P) and (P) coincide, and if both problems satisfy Slaters condition, then the optimal solutions X^^^ and yopt^ gopt both problems exist and satisfy the equation ^OptgOpt ^ Conversely, any pair X and y, S of feasible points for (P) and (P) satis- fying (1) is optimal for both problems, see e.g. Shapiro and Scheinberg (2000). Condition (1) implies that there exists a unitary matrix U that simultaneously diagonalizes X^^^ and 5^^^. Moreover, the eigenvalues of Xopt gopt same eigenvector are complementary. The two main applications of semidefinite programs are relaxations for combinatorial optimization problems, see e.g. Alizadeh (1991); Helmberg et.al. (1996); Goemans and Williamson (1994), and semidefinite pro- grams arising from Lyapunov functions or from the positive real lemma in control theory, see e.g. Boyd et.al. (1994); Leibfritz (2001); Scherer (1999). Next, we give two simple examples for such applications. 2.1 A first simple example In our first example we consider the differential equation x{t) = Ax{t) for some vector function x : IR ^ IBP, By definition, this system is called stable if for all initial values x^^^ — x(0) the solutions x(t) converge to zero when t oo. It is well known, see e.g. Hirsch and Smale (1974), that this is the case if and only if the real part of all eigenvalues of A is negative, Re(Ai(A)) <0 for 1 < i < n. By Lyapunov’s theorem, this is the case if and only if 3PyO: -A^P -PAyQ. 58 Let us now assume that the system matrix A is subject to uncertainties that can be “confined” to a polyhedron with m given vertices i.e. A = A(t) € conv{^(^) , . . . , } for t> 0. In this case the existence of a Lyapunov matrix P 0 with -(A(^)fp - PA(^) yO tor l<i<m (2) implies that -A{tfP - PA{t) y 0 for all t > 0, and hence, 0 > x{t)^ (^A{t)'^P + PA{t)^ x{t) = {A{t)x{t))^ Px{t)+x{t)^P {A{t)x{t)) = I {MtfPMt}) = J^\W)\?P whenever x(t) / 0. This implies that ||3;(t)||p -> 0, and hence the existence of a matrix P y 0 satisfying (2) is a sufficient condition to prove stability of the uncertain system. (The above argument only shows that ||a;(t)||p is monotonously decreasing. In order to show that ||rr(t)||p converges to zero, one can find a strictly negative bound for J^||^(^)||p using the largest real part of the eigenvalues of {A^'^^Y'P + PA^'^\) There are straightforward ways to formulate the problem of finding a matrix P y Q satisfying (2) as a linear semidefinite program, see e.g. Boyd et.al. (1994). While this simple example results in a linear semidefinite program, other problems from controller design often result in bilinear semidefinite programs that are no longer convex, see e.g. Leibfritz (2001); Scherer (1999); Freund and Jarre (2000). 2.2 A second simple example Binary quadratic programs (also known as max-cut-problems) have few applications in VLSI layout or in spin glass models from physics. Their most important property, however, appears to be the fact that these problems are A/*P-complete (and hence, there is no known poly- nomial time method for solving these problems). What makes these problems so appealing is that they appear to be quite easy. Let MC = conv i^^xx^ I Xi G {±1} for 1 < i < C be the max cut poly tope. Hence, MC is the convex hull of all rank-1 matrices generated by ±1- vectors x. Any binary quadratic program or Nonlinear Semidefinite Programming any max-cut problem can be written in the following form: 59 minimize C • X s.t. X G MC. (3) This is a standard linear program with the drawback that the feasible set MC is defined as convex hull of exponentially many points xx^ ^ rather than being defined by (a polynomial number of) linear inequalities. Let e = (1, . . . , 1)^ be the vector of all ones. It is straightforward to see that MC can be written in the form MC = conv {{XyQ \ diag(X) = e, rank(X) = 1}) . Due to the condition diag(X) = e the set MC lies in an affine subspace of S'^ of dimension n{n — l)/2. MC has 2^“^ vertices that are pairwise adjacent i.e. connected by an edge (a 1-dimensional extreme set of MC). Note that the constraints of this second definition of MC appear to be smooth constraints; a semidefiniteness constraint, a linear constraint, and a rank condition. These conditions, however, imply that there are only finitely many “discrete” elements of which the convex hull is taken. In some sense the constraints contain a hidden binary constraint allowing only certain matrices with entries ±1. When the rank constraint is omitted, we obtain the standard SDP relaxation of the max-cut problem, SVV ^{XyO \ diag(X) = e} satisfying MC C SVV. A relaxed version of (3) is thus given by minimize C • X s.t. X G SVV. (4) This problem is a linear SDP of the form (P) and can be solved ef- ficiently using, for example, interior point methods, see e.g. Helmberg et.al. (1996). Goemans and Williamson (1994) have shown how to ob- tain an excellent approximation of the max-cut problem (3) using the solution X of (4). A quite interesting inner approximation of MC leading to a nonlinear semidefinite program is described in Chapter 3.3. 2.3 Smoothness of semidefiniteness constraint To understand the complexity of nonlinear semidefinite programs we briefly address the question of smoothness and regularity of the semidef- inite cone. The set of positive semidefinite matrices can be characterized in several different forms. 60 = {X I A^i„(X) > 0} = {X I Xi{X) > 0 for 1 < i < n} = |x I uF Xu > 0 for all u € iR” (||m|| = 1)| = {X I det(Xs,E) > 0 for all S C {1, . . . , n}} = {X I : X = Z2}. The first characterization uses the smallest eigenvalue \min{X) of X. This is a nonsmooth representation. When ordering the eigenvalues in a suitable way, the eigenvalues Ai(X) used in the second representa- tion have directional derivatives, but are not totally differentiable. The third representation is based on a semi-infinite constraint. Prom this representation one can easily deduce, for example, that {X | X ^ 0} is convex. The fourth representation is based on a finite (but exponential) number of smooth constraints, requiring all principal subdeterminants to be nonnegative. This representation certainly justifies the claim that {X I X ^ 0} is bounded by smooth constraints. As shown in Pataki (2000), the tangent plane to {X | X ^ 0} at a point X is given as fol- lows. Let X = UDU'^ with a diagonal matrix D and a unitary matrix U . If X is a boundary point of {X I X ^ 0} we may assume without loss of generality that the first k diagonal entries of D satisfy d\ — = dk = ^ and • • • 9 > 0. Let AX be given by AX = C/f° *\u'^ where the 0-block in the matrix on the right hand side is of size k x and the entries * are any entries of suitable dimension. All matrices AX of the above form belong to the tangent space of {X | X ^ 0} at X. The fourth representation also leads to the convex barrier function $(X) = -logdet(X) for the positive semidefinite cone. For this barrier function it is sufficient to consider S = {1, . . . ,n}, and to set $(X) = 00 whenever X is not positive definite. The last representation is a projection of a quadratic equality con- straint. Most, if not all, of the above representations have been used numeri- cally to enforce semidefiniteness of some unknown matrix X. The set {X | X ^ 0} certainly satisfies Slaters condition, or, in the context of nonconvex minimization, any point X G {X | X ^ 0} triv- ially satisfies the constraint qualification by Robinson. However, the Nonlinear Semidefinite Programming 61 fourth representation above does not satisfy LICQ. (LICQ is a common regularity condition requiring that the active constraints at any point are linearly independent, see e.g. Wright and Nocedal (1999).) In fact, (for n > 1) there does not exist any representation of the positive semidef- inite cone by nonlinear inequalities that do satisfy LICQ. Nevertheless the positive semidefinite cone and its surface are numerically tractable, and may be considered as a regular set with smooth constraints. 2.4 A dual barrier method We consider problem {D) and eliminate the slack variable S to obtain the problem maximize b^y s.t. C — v4.*(y) ^ 0. For y G with C — A^{y) ^ 0 we then define a convex barrier function $(y) = - log (det(C - A*{y))) . A plain dual barrier method can be stated as follows: Dual barrier method Start: Find with C - ^ 0. For A; = 1, 2, 3, ... ■ Set (ik = 10“^ and find w y{y,k) = argmin + $(y) y by Newton’s method with line search starting at Of course, this conceptual method needs many refinements such as an appropriate choice of the starting point and a somewhat more sophisti- cated update of With such minor modifications, however, the above algorithm solves the semidefinite programming problem in polynomial time. (The notion of polynomiality in the context of nonlinear pro- gramming is to be taken with care; the solution of a linear semidefinite program can have an exponential “size” like an optimal value of 2^"" for a semidefinite program with encoding length 0(n). Our reference to “polynomial time” is meant that the method reduces some primal dual gap function in polynomial time, see e.g. Nesterov and Nemirovski (1994).) The key elements in guaranteeing the theoretical efficiency of the bar- rier method rest on two facts: ■ The duality gap (or some linear measure of closeness to optimality) is of order /i. 62 ■ and the Hessian of the barrier function satisfies a local relative Lipschitz condition. Both facts were shown by Nesterov and Nemirovski (1994) and rest on two conditions introduced in Nesterov and Nemirovski (1994). The first fact is implied by a local Lipschitz condition of $ with respect to the norm induced by V^$(y), and the second fact is called self-concordance, and implies that Newton’s method converges globally at a fixed rate. More details can be found in Nesterov and Nemirovski (1994); Jarre (1996). The guaranteed convergence results in these references are much slow- er than what is observed in implementations of related methods. In fact, these theoretical results are much too slow to be relevant for practical applications. However, these results guarantee a certain independence of the method from the data of the problem. Even with exact arithmetic, the performance of the steepest descent method for unconstrained mini- mization, for example, depends on the condition number of the Hessian matrix at the optimal solution. Unlike the steepest descent method, the worst case bound for the barrier method only depends on the dimension n of the problem (£)), but not on any condition numbers or any other parts of the data of the problem. In this respect, the theoretical analysis is relevant for practical applications. The above barrier method is not suitable for practical implementa- tions. The following simple acceleration scheme is essential for obtaining a more practical algorithm: Observe that the points y(/i) that are ap- proximated at each iteration of the barrier method satisfy -- + V$(y(M))=0. /i Differentiating this equation with respect to /i yields ^ + V^$(y(^))y(/x) = 0. r For given values of /i and y(/i) this is a linear equation that can be solved for y{n). (The matrix is the same as the one that is used in the Newton step for finding y{ij).) Given this observation we can state a more efficient predictor corrector method. Dual predictor corrector method Start: Find y^^^ and /io > 0 with « y(/^o)- For k = 2, 3, ... ■ Choose A/ifc e (0,/ifc-i) such that - ^ni-yink-i) satisfies C — A*{y^^^) ^ 0. Nonlinear Semidefinite Programming 63 ■ Set fik = f^k-i ~ and find y{l^k) by Newton’s method with line search starting at It turns out that can be computed fairly accurately even if only an approximate point ^ y(/i^_i) is known. For details see e.g. Jarre and Saunders (1993). This predictor corrector method is “reasonably efficient” , but primal-dual approaches are more efficient in general. We will generalize this method to nonlinear semidefinite programs in the next section 3, Nonlinear Semidefinite Programs In this section we consider nonlinear semidefinite programs of the form maximize b^y s.t. A{y) y 0, fi{y) <0 for 1 < i < m, (5) where A : IR^ — )► 5^ is a smooth map and fi : IR are smooth func- tions. Note a slight change of notation, in this chapter ^ is a nonlinear operator, A : IR^ -> SK We define a (possibly nonconvex) barrier function $, m $(y) = -logdet(^(y)) - ^ log(-/j(y)) i—1 and local minimizers 2 /(/i) = local minimizer of + $(y). ( 6 ) In slight abuse of notation we will denote any local minimizer by y(/i); this definition therefore does not characterize y(/i) uniquely. Replacing $ with $, both, the barrier method and the predictor cor- rector method of Chapter 2.4 can also be applied to solve problem (5). There are two questions regarding the efficiency of the predictor cor- rector method for solving (5). (The barrier method is certainly unprac- tical!) ■ Does y = lim)^_^oo exist, and if so, is y a “good” locally optimal solution of (5)? ■ How quickly can be computed? 3.1 Issues of global convergence As to the first question, one can show (see e.g. Jarre (2001)) that any accumulation point y of the sequence y^^^ satisfies the Fritz-John 64 condition, (for a definition see e.g. Borgwardt (2001)) m 3u > 0, a ^ 0 : -u^h + ^ UiV fi{y) + Wm+i V det(^(y)) = 0. i=l While this condition is reasonable in the absence of a constraint qual- ification it is not suitable for semidefinite programs. Indeed, when- ever A{y) has the eigenvalue zero of multiplicity more than one, then V det (A{y)) — 0, so that one can choose Um+i — 1 and Ui = 0 for all other i. A more appropriate convergence result therefore is ^Ay : Ay > 0, Vfi{y)Ay < 0 for all y with fi{y) = 0 Ai{y) + eDA{y)[^y] >- 0 for small e > 0. This result states that there does not exist any direction Ay starting at y that is strictly linearized feasible and does not increase the objective function. Neither of the statements guarantees that y is a local minimizer. In- deed there are simple degenerate examples for which y is the global max- imizer of (5). As shown in Jongen and Ruiz Jhones (1999), for nonlinear programs satisfying an LICQ condition and not containing “degenerate” critical points, the limit point of y^^^ is a local minimizer. For such prob- lems one can still construct examples, such that y is a very “poor” local minimizer. Nevertheless we believe that in many cases y is a minimizer whose objective value is “close” to the global minimum of (5). This intuition is motivated by the work Nesterov (1997). Nesterov consid- ered the problem of minimizing a quadratic function over the oo-norm unit cube. This problem may have very poor local minimizers (whose objective value is much closer to the global maximum value than it is to the global minimum). Nesterov shows that any local minimizer over a y-norm cube with a suitable value of y = O(logn) has much better global properties in the sense that it is at least as good as the result guaranteed by the semidefinite relaxation. Intuitively, this result is due to the fact that the y-norm cube “rounds” the vertices and edges of the oo-norm cube. By this rounding procedure, the poor local minimizers disappear. In two dimensions the level sets of the logarithmic barrier function are almost indistinguishable from suitably scaled y-norm cubes. This leads us to believe that at least for quadratic minimization prob- lems over the oo-norm unit cube, a suitably implemented barrier method will also generate “good” local minimizers. Nonlinear Semidefinite Programming 65 3.2 Efficiency of local minimization Note that by definition, y(iik) is a local minimizer of (6), and hence, ^ 0. In all of our test problems the iterates ^ yil^k) satisfied the stronger condition 0. If this relation is satis- fied the extrapolation step for computing y^^^ in the predictor corrector method can be carried out in the same way as in the convex case. However, the iterates “on the way” from y^^^ to y^^^ often do not satisfy >- 0. This implies that the concept of self-concordance that formed the basis of the dual barrier method and of the predictor corrector method for solving {D) is no longer applicable. While it is not yet possible to generalize the theory of self-concordance to nonconvex functions, it seems possible that the known Lipschitz continuity proper- ties of carry over in some form to The tool that was used for minimizing the barrier function involving $ in Section 2.2 is Newton’s method. When ^ 0, Newton’s method with line search for approximating y{iJ>k) is no longer applicable. We need to find a suitable generalization of Newton’s method to the nonconvex case involving $. For this generalization we keep the fol- lowing properties in mind: The barrier subproblems that need to be solved at each step of the barrier method (or of the predictor correc- tor method) are systematically ill-conditioned. The condition number typically is 0(l//i), and the constant in the order notation is typically large. In addition, the computation of the Hessian matrices often is very expensive. Possible minimization methods for approximating y{y>k) include trust region methods with quasi-Newton updates of an approximate Hessian, see e.g. Conn et.al. (2000), continuation methods, or expensive plane search strategies as proposed in Jarre (2001). In numerical examples it turned out that the minimization problems tend to be quite difficult and none of the minimization methods con- verge quickly. In particular, the barrier subproblems appear to be sub- stantially more difficult to solve than in the convex case. We therefore address the complexity of smooth nonconvex local minimization. The next section shows that local minimization is AfV-ha^rd in a certain sense. 3.3 Returning to the max cut problem We return to the example in Chapter 2.2. As shown in Nesterov (1998) an inner approximation for the polyhedron A4C is given by ^^A=(x £ SVV I sin ^ 0 66 Here, the square brackets sin [f X] are used to indicate that the sin function is applied componentwise to each of the matrix entries of |X. The set J\fA is formed from SW using the function c : [—1,1] [“I5 1] with c{t) = This function is a nonlinear “contraction” in the sense that \c{t)\ < |t|. It is somewhat surprising to find out that conv(A04) = MC^ i.e. AfA C conviAfA) = A4Cc SVV. see Nesterov (1998). A simple picture can explain the relationship of AfC, SVV ^ and AfA. The set AdC is a polytope whose precise description is not known in spite of its simple structure. (More precisely, there does not exist any known polynomial time algorithm which, given a point AT, either returns a certificate proving that X G AdC or returns a separating hyperplane.) The set SVV is obtained by “infiating” the set AdC while keeping all faces of dimension < n — 2 fixed. Like a balloon we “pump up” the hull of MC while keeping certain low-dimensional boundary manifolds fixed. (Note that AdC has dimension n(n — 1)/2.) The set SVV is convex and is “efficiently representable”, i.e. there exist efficient numerical algorithms for minimizing convex functions over SVV. The set AfA is obtained by shrinking SVV in a certain nonlinear fashion. This shrinkage is done in a certain optimal way such that all boundary manifolds of dimensions 1 and 2 of AdC are contained in AfA. In particular, for n = 3 we have AdC — A/14, see Hirschfeld and Jarre (2001). The set AfA is bounded by two smooth constraints, is star shaped, contains a ball of radius 1, and is contained in a ball of radius n. By our previous considerations, any locally optimal vertex of minimize C • X s.t. AT G AfA (7) solves the max cut problem (3). Hence, in spite of the nice properties of A/14, it must be very difficult to find a local optimal vertex of (7) or to check whether a given vertex is a local minimum. Note that (7) is a nonlinear semidefinite program. The difficulty of the local minimization of (7) is due to the fact that problem (7) suffers from a systematic violation of any constraint qualification. It contains many “peaks” similar to the one in G IR^ I a; > 0, . In higher dimensions such peaks become untractable. Nonlinear Semidefinite Programming 67 3,4 Finding an e-KKT-point In a second example, see Hirschfeld and Jarre (2001), the so-called chained Rosenbrock function / : M f{x) = (xi - if + 100 '^{Xi - x1_ff i=2 (see also Toint (1978)) has been tested. This function has only one local minimizer which is also the global minimizer, x = (1,...,!)^. Applying various trust region methods for minimizing / starting at = (—1,1,...,!)^ results in running times that appear to be ex- ponential in n. (These running times are purely experimental, and due to time limitations could only be tested for small values of n.) At first sight this result seems to contradict a statement by Vavasis. In the paper Vavasis (1993) the following result is shown. Consider the problem minimize f{x) s.t. — 1 < < 1 for 1 < i < n. (8) Vavasis assumes that the gradient V/ is Lipschitz continuous with Lip- schitz constant M and considers the problem of finding an e-KKT point for (8). He presents an algorithm that takes at most O(^) gradient evaluations to find an e-KKT point. This bound is exponential with re- spect to the number of digits of the required accuracy, i.e. with respect to loge”, but linear with respect to n. He also presents a class of functions of two variables for which any algorithm has a worst case complexity of at least gradient eval- uations to find an e-KKT point. The conditions of Vavasis’ paper apply to the Rosenbrock example as well. All points at which this function is evaluated by the trust region algorithms lie in the box —1 < X{ < 1, and moreover, Rosenbrocks function possesses moderately bounded norms of V^/ at these points implying that M is consistently small. The reason for the observed exponential growth of the number of iterations lies in the fact that the norms of the gradients do become small very quickly (as predicted by Vavasis even for a steepest descent method), but for large n, the norm of V/ needs to be extremely small to guarantee that the iterate is close to a local minimizer. Thus the exponential growth with respect to the number of variables is due to the fact that the e-KKT condition is a poor condition for large n. (We don’t know of any better condition though!) More results on local minimization issues are discussed in the forthcoming paper Hirschfeld and Jarre (2001). 68 4. Conclusion We have highlighted some issues of nonlinear semidefinite program- ming related to a dual barrier method. In particular we have raised the questions of smoothness, regularity, and computational complexity related to semidefinite programs. As preliminary numerical results in Jarre (2001) indicate, variants of the predictor corrector method of the present paper are reasonably fast for medium size problems (up to 500 unknowns). The numerical results were also compared with the ones in Fukuda and Kojima (2001). In all examples it turned out that the method proposed in this paper converged to the global minimizer. This gives some further weak evidence that the method is indeed unlikely to be “trapped” near poor local minimizers. We also indicated that the local convergence of solving the barrier subproblems in the predictor corrector method is slow; improvements of this convergence behavior are the subject of future research. References F. Alizadeh, “Combinatorial Optimization with Interior Point Methods and Semidef- inite Matrices” PhD Thesis, University of Minnesota (1991). Optimierung, Operations Research, Spieltheorie Birkhauser Verlag (2001). 5. Boyd, L, El Ghaoui, E. Feron, and V. Balakrishnan, Linear Matrix Inequalities in Systems and Control Theory, Volume 15 of Studies in Applied Mathematics SIAM Philadelphia, PA, (1994). A.R. Conn, N.I.M. Gould and Ph.L. Toint, Trust-Region Methods MPS/SIAM Series on Optimization, SIAM, Philadelphia, (2000). A. Forsgren, “Optimality conditions for nonconvex semidefinite programming”. Math- ematical Programming 88 (2000), 105-128. R.W. Freund and F. Jarre, “An Extension of the Positive Real Lemma to Descriptor Systems” Report 00/3-09, Scientific Computing Interest Group, Bell Labs, Lucent Technologies, (2000) M. Fukuda and M. Kojima, “Branch-and-Cut Algorithms for the Bilinear Matrix Inequality Eigenvalue Problem”, Computational Optimization and Applications Vol.l9, No.l, pp.79-105, (2001). M.X. Goemans and D.P. Williamson, “.878- Approximation Algorithm for MAX CUT and MAX 2SAT”, in ACM Symposium on Theory of Computing (STOC), (1994). C. Helmberg, F. Rendl, R.J. Vanderbei, “An Interior-Point Method for Semidefinite Programming” SIAM J. Optim. 6(2):342-361 (1996). M.W. Hirsch and S. Smale Differential equations, dynamical systems, and linear al- gebra Acad. Press, New York, (1974). B. W. Hirschfeld and F. Jarre, “Complexity Issues of Smooth Local Minimization”, Technical Report, Universitat Diisseldorf, in preparation (2001). “Interior-Point Algorithms for Classes of Convex Programs” in T. Terlaky ed. Interior Methods of Mathematical Programming Kluwer (1996). Nonlinear Semidefinite Programming 69 F. Jarre, “A QQP-Minimization Method for Semidefinite and Smooth Nonconvex Programs”, Technical Report, University of Diisseldorf, Germany, to appear in revised form in Optimization and Engineering (2001). F. Jarre and M.A. Saunders, “A Practical Interior-Point Method for Convex Pro- gramming”, SIAM J. Optim. 5(1) pp. 149-171 (1995). H.T. Jongen and A. Ruiz Jhones, “Nonlinear Optimization: On the Min-Max Digraph and Global Smoothing”, in A. Ioffe, S. Reich, I. Shafrir eds Calculus of Variations and Differential Equations Chapman Hall / CRC Research Notes in Mathematics Series, Vol 410, CRC Press, (UK) LLC, pp.119-135, (1999). M. Kocvara and M. Stingl, “Augmented Lagrangian Method for Semidefinite Pro- gramming” forthcoming report. Institute of Applied Mathematics, University of Erlangen-Nuremberg (2001). F. Leibfritz, “A LMI-based algorithm for designing suboptimal static / output feed- back controllers”, SIAM Journal on Control and Optimization^ Vol. 39, No. 6, pp. 1711 - 1735, (2001). Y.E. Nesterov, Talk given at the Conference on Semidefinite Optimization, ZIB Berlin, (1997). Y.E. Nesterov, “Semidefinite Relaxation and Nonconvex Quadratic Optimization”, Optimization Methods and Software 9, pp. 141-160, (1998). Y.E. Nesterov and A.S. Nemirovski, Interior Point Polynomial Algorithms in Convex Programming^ SIAM Publications, SIAM Philadelphia, PA, (1994). G. Pataki, “The Geometry of Semidefinite Programming”, in H. Wolkowicz, R. Saigal, L. Vandenberghe eds Handbook of Semidefinite Programming: Theory, Algorithms and Applications Kluwer’s International Series (2000). C. Scherer, “Lower bounds in multi-objective H 2 /H 00 problems”, Proc. 38th IEEE Conf. Decision and Control, Arizona, Phoenix (1999). A. Shapiro and K. Scheinberg, “Duality and Optimality Conditions”, in H. Wolkowicz, R. Saigal, L. Vandenberghe eds Handbook of Semidefinite Programming: Theory, Algorithms and Applications Kluwer’s International Series (2000). Ph.L. Toint, “Some Numerical Result Using a Sparse Matrix Updating Formula in Unconstrained Optimization”, Mathematics of Computation, vol. 32(143), pp. 839- 851, (1978). R. J. Vanderbei, “LOQO User’s Manual - Version 3.10” . Report SOR 97-08, Princeton University, Princeton, NJ 08544, (1997, revised 10/06/98). R. J. Vanderbei, H. Benson and D. Shanno, “Interior-Point Methods for Nonconvex Nonlinear Programming: Filter Methods and Merit Functions” , Report ORFE 00- 06, Princeton University, Princeton, NJ 08544, (2000). S. A. Vavasis, “Black-Box Complexity of Local Minimization”, SIAM J. Optim. 3(1) pp.60-80, (1993). S.J. Wright and J. Nocedal, Numerical Optimization, Springer Verlag, (1999). IMPLICIT FILTERING AND NONLINEAR LEAST SQUARES PROBLEMS C. T. Kelley North Carolina State University Center for Research in Scientific Computation Department of Mathematics Box 8205, Raleigh, N C. 27695-8205, USA Tim.KelleyOncsu.edu Abstract In this paper we motivate and analyze a version of the implicit filtering algorithm by viewing it as an extension of coordinate search. We then show how implicit filtering can be combined with the damped Gauss- Newton method to solve noisy nonlinear least squares problems. Keywords: noisy optimization, implicit filtering, damped Gauss-Newton iteration, nonlinear least squares problems 1. Introduction The purposes of this paper are to show how a version of the implicit filtering algorithm [24, 17, 16] can be motivated and analyzed by viewing it as an elaboration of coordinate search, and to describe and analyze a implicit filtering Gauss-Newton method for nonlinear least squares problems. Our approach to nonlinear least squares problems is based on a finite- difference form of the damped Gauss-Newton method [11, 24, 32], but differs from that in the MINPACK [30] routine Imdif .f. That code uses forward difference Jacobians with a user-defined difference incre- ment, but that increment is set only once. Implicit filtering uses a cen- tral difference not only to compute more accurate Jacobians, but more importantly to avoid local minima and to decide when to reduce the difference increment. Implicit filtering, which we describe in § 2, is a deterministic stencil- based sampling method. In general terms, implicit filtering is a finite- difference quasi-Newton method in which the size of the difference stencil 71 72 decreases as the optimization progresses. In this way one hopes to “fil- ter” low-amplitude, high-frequency noise in the objective function. Sampling methods do not use derivatives, but rather sample the ob- jective function on a stencil or pattern to determine the progress of the iteration and whether or not to change the size, but not the shape, of the stencil. Many of these methods, like implicit filtering, the Hooke- Jeeves [20] method, and multidirectional search [38, 39], reduce the size of the stencil in the course of the optimization. The stencil-size reduction policy leads to a convergence theory [24, 5, 39]. The best-known sampling method is the Nelder-Mead [31] algorithm. This method uses an irregular pattern that changes as the optimization progresses, and hence is not stencil-based in the sense of this paper. Analytical results for the Nelder-Mead algorithm are limited [24, 5, 26]. Theoretical developments are at also a very early stage for more aggres- sive sampling methods, like the DIRECT [22] algorithm, [14, 15]. Sampling methods, for the most part, need many iterations to obtain a high-precision result. Therefore, when gradient information is avail- able and the optimization landscape is relatively smooth, conventional gradient-based algorithms usually perform far better. Sampling meth- ods do well for problems with complex optimization landscapes like the ones in Figure 1, where nonsmoothness and nonconvexity can defeat most gradient based methods. Figure 1. Optimization Landscapes We caution the reader that sampling methods are not designed to be true global optimization algorithms. Problems with violently oscillatory optimization landscapes are candidates for genetic algorithms [19, 35], simulated annealing [25, 41], or the DIRECT algorithm [22, 21]. The paper is organized as follows. In § 2 we briefly describe the im- plicit filtering method and some of the convergence results. We describe Implicit Filtering 73 the new algorithm in § 3 and prove a local convergence result. In § 4 we illustrate the ideas with a parameter identification problem. 2. Implicit Filtering In this section we introduce implicit filtering. We show how the method can be viewed as an enhanced form of a simple coordinate search method. Convergence analysis for methods of this type is typically done in a setting far simpler than one sees in practice. Many results require smooth objective functions [28, 26, 12, 39, 8, 9] or objective functions that are small perturbations of smooth functions [29, 17, 23, 5, 24, 7, 44]. The main results in this paper make the latter assumption. We will also assume that the noise decays near an optimal point. Such decay has been observed in practice [36, 10, 42, 43, 37, 4] and methods designed with this decay in mind can perform well even when the noise does not decay to zero as optimality is approached. 2.1 Coordinate Search We begin with a discussion of a coordinate search algorithm, the sim- plest of all sampling methods, and consider the unconstrained problem min fix). (1) xeR^ From a current point Xc and stencil radius or scale he we sample / at the 2N points S{Xc, he) = {Xe ± heCj}, ( 2 ) where ej is the unit vector in the jth coordinate direction. Then either Xe or he is changed. ■ If f{Xc) < min f{x) X^S {Xq )^c) then we replace he by /z+ = he/2 and set x-\- = Xe- (3) ■ Otherwise, we replace Xe by any point in x.^ E S such that f(x+)= min fix) x^S(x,h) and let = he. We refer to (3) as stencil failure. If / is Lipschitz continuously differentiable, then [24, 5] stencil failure implies that ||V/(0;e)||=O(M- (4) 74 Now, if / has bounded level sets, h will be reduced infinitely many times because there are only finitely many points on the grid with function values smaller than f{xc) [38]. Hence, by (4), the gradient of / will be driven to zero, giving subsequential convergence to a point that satisfies the necessary conditions for optimality. One-sided stencils [24, 36] and more general stencils with < 2N di- rections have also been used [1, 2, 27] and have similar theoretical prop- erties. Our experience has been that a full centered-difference stencil is better in practice. Sampling methods do more than solve smooth problems. Consider an objective which is the sum of a smooth function fx and a non-smooth function 0, which we will refer to as the noise. f{x) = fs{x) + (j){x) (5) We assume that (j) is uniformly bounded and small relative to /s, but make no smoothness or even continuity assumptions beyond that. High- frequency oscillations in (f) could result in local minima of / which would trap a conventional gradient-based method far from a minimizer of /g. If (j) decays sufficiently rapidly near a minimizer of /, then the coordinate search method responds to /g and, in a sense, “does not see” (p. To quantify the claim above, we return to the concept of stencil failure. Define U\\s{x,h)= max |(?i(a;)|. Z ^ O ^ CC ^ tl/ J If (3) holds and / satisfies (5), then [24, 5] \\^fsixc)\\ =o(^hc + . ( 6 ) Now, let {xn] be the sequence of coordinate search iterations and {hn} be the sequence of stencil radii, which we will refer to as scales. If / has bounded level sets, then the set of possible iterations for a given scale h is finite, as they lie on a grid [39], hence hn 0. If, moreover, the noise decays rapidly enough so that lim = 0, (7) n^oo hn then Vfs{xn) 0, by (6). This asymptotic result does not address an important practical issue. The number of times that h will be reduced during the optimization needs to be specified when the optimization begins or a limit on the Implicit Filtering 75 number of calls to / must be imposed. Most implementations of sam- pling methods use one or both of these as termination criteria. In the simple case where fs is a convex quadratic, for example, co- ordinate search, therefore, “jumps over” oscillations in (j) early in the iteration, when h is large, and, after finding a neighborhood of the min- imizer, increases the resolution {i. e. decreases the scale) and converges. 2.2 Implicit Filtering The version of implicit filtering which we discuss in this paper acceler- ates coordinate search with a quasi-Newton method. We use the sample values to construct a centered difference gradient Vhf{xc)- We then try to take a quasi-Newton step x+^Xc- H~^VhJ{xc) ( 8 ) where He is a quasi-Newton model Hessian. We find that the BFGS [6, 18, 13, 34] works well for unconstrained problems. We reduce the scale when either the norm of the difference gradient is sufficiently small or stencil failure occurs. We formally describe implicit filtering below as a sequence of calls to a finite-difference quasi-Newton algorithm (fdquasi) followed by a reduction in the difference increment. The quasi-Newton iteration is terminated on entry if stencil failure is detected. The other termination criteria of the quasi-Newton iteration reflect the truncation error in the difference gradient. The tolerance for the gradient \\^hf{x)\\<rh (9) is motivated both by the heuristic that the step should be at least of the same order as the scale, by the implication (6) of stencil failure, and by the error estimate [24] [24] l|V/.(x) - Vhf{x)\\ = o(^h‘^+ j . (10) The performance of implicit filtering can be sensitive to the choice of the parameter r if, as was the case for the earliest implementations of implicit filtering [36, 17, 10], the test for stencil failure is not incorporated into the algorithm. The line search is not guaranteed to succeed because the gradient is not exact, therefore we allow only a few reductions in the step length before exiting the quasi-Newton iteration. If the line search fails, then sufficient decrease condition f{xc - XVhJixc)) - fixe) < -aX\\'VhJ{xc)f ( 11 ) 76 has been violated. Here, as is standard, [11, 24], a is a small parameter, typically 10“^. If both (9) and (11) fail, then one can show in some cases [17] that the noise is sufficiently larger that the scale to justify terminat- ing the entire optimization. This leads to the question of selection of the smallest scale, which is open. In some special cases, [17] failure of the line search can be related to the size of noise, motivating termination of the entire optimization because the assumption that ||(/)|| is much smaller than h is no longer valid. Algorithm 1 fdquasi(a;, /,pmaa;, r, /i, amax) while p < pmax and \\Vhf{x)\\ > rh do compute / and V/j/ if (2) holds then terminate and report stencil failure end if update the model Hessian H if appropriate; solve Hd = —'Vhf{x) use a backtracking line search, with at most amax backtracks, to find a step length A if amax backtracks have been taken then terminate and report line search failure end if X <- X + \d p e- p + 1 end while if p > pmax report iteration count failure Implicit filtering is a sequence of calls to fdquasi with the difference increments or scales reduced after each return from fdquasi. Algorithm 2 imfilter (rr, /,pmarr, r, amarr) for A; = 0, ... do fdquasi (x, f^pmax^ r, /i^, amax) end for Our analysis of coordinate search depended on the fact that ||v/,(a;„)|| + (12) when stencil failure occurred and that h was reduced when that hap- pened. Since stencil failure directly implies success, as do (6) and (9) Implicit Filtering 77 together, the convergence result for coordinate search will hold for im- plicit filtering provided the line search only fails finitely often and the quasi-Newton iteration terminates because of stencil failure or satisfac- tion of (9), We summarize these observations in a theorem from [24]. Theorem 1 Let f satisfy (5) and let V/s be Lipschitz continuous. Let hji 0 and let {xn} be the implicit filtering sequence. Assume that either (3) or (11) hold after each call to fdquasi (i. e. there is no line search failure or iteration count failure) for all but finitely many k. Then if + h~^U\\s{x,hn)) =0 (^3) then any limit point of the sequence {xn} is a critical point of fg. Theorem 1 does not explain the performance of implicit filtering in practice. In fact, other methods, such as coordinate search, Hooke- Jeeves, and MDS, also satisfy the conclusion of Theorem 1 if (13) holds, [24, 40]. Implicit filtering performs well only if a quasi-Newton model Hessian is used. The reasons for the efficacy of the quasi-Newton meth- ods are not fully understood. A step toward such an understanding is in [7], where a superlinear convergence result is presented. That result is somewhat like the one we give in § 3 and we will summarize it here. Assumptions on the rate of decrease of {hn} and of the size of (j) must be made to prove convergence rates. Landscapes like those in Figure 1 motivated the qualitative decay assumption (13). To obtain superlinear convergence one must ask for much more and demand that h and (j) satisfy \\VHm\\ = o{\\x-x*f+n ( 14 ) for some p > 0. Here a:* is a local minimizer of fg. Satisfaction of (14) is possible in practice if both (j) and the scales h decrease near x*. As an example, suppose that fg has a local minimizer x*^ ^‘^fs is Lipschitz continuous in a neighborhood of a;*, W^^fsix*) is positive definite, and for X sufficiently near a;*, |«x)|=0(||x-i'f+2P), (15) for some p > 0. In that case, if one sets hn+l = l|V;.„/(x„+i)||l+^ (16) and other technical assumptions hold, then one can show that the im- plicit filtering iteration, with the BFGS update, is locally superlinearly convergent to x*. 78 3. Gauss-Newton Iteration and Implicit Filtering For the remainder of this paper we focus on nonlinear least squares objective functions 1 ^ 1 /(^) = 2 ^ (17) Z=1 We assume that R{x) =Rs{x)^^{x) (18) where Rg : R^ -> R^ is Lipschitz continuously differentiable. Here the noise $ in the residual does not correspond to noise in any data in the problem, but rather noise in the computation of R. As an example, if one is doing a nonlinear fit to data, R might have the form R = M{x) — d^ where d is a vector of data and x are the model parameters. The noise we have in mind is in the computation of M, not in d. The noise $ in J? can be related to the noise 0 in / by (f){x) = R{x)^^{x) + ^{x)^^{x)/2. (19) 3.1 Implicit Filtering Gauss-Newton (IFGN) Algorithm Our implementation of implicit filtering for nonlinear least squares differs from the one described in § 2 in two ways: ■ The Jacobian of the residual, not the gradient of the objective function, is approximated by finite differences. ■ The Gauss-Newton model Hessian is used instead of a quasi-New- ton model Hessian. We let VhR{x) be the centered difference gradient of R based on the stencil S{x^h). Our finite difference Gauss-Newton iteration Algo- rithm fdgauss, must be prepared for stencil failure and failure of the line search. The sufficient decrease condition is now f{xc - Ad) - f{xc) < -aX{{VhR{xc)f R{xc)fd. (20) where d = -{V hR{xc)^ V hR{xc))~^V hR{xcf R{xc) is the IFGN direction. Implicit Filtering 79 Algorithm 3 fdgsiuss{x^ R^pmax^r^h,amax) P= 1 while p < pmax and ||(V/ii?(rr))^i?(a;)|| > rh do compute / = R{x)'^ R{x) /2 and V/^i? if (2) holds then terminate and report stencil failure end if set H = {VhR{x))^{VhR{x))] solve Hd - -VhR{xfR{x). use a backtracking line search, with at most amax backtracks, to find a step length A if amax backtracks have been taken then terminate and report line search failure end if X ^ X + Xd p <- p + 1 end while if p > pmax report iteration count failure The implicit filtering form of the damped Gauss-Newton method, (Al- gorithm IFGN) calls fdgauss repeatedly, reducing the scale with each iteration. Algorithm 4 IFGN(a;, i?,pmarr, r, amaa;) for A: — 0, ... do f dgauss(a:, R^pmax^ r, amax) end for 3.2 Convergence Analysis We will make a distinction between the central difference gradient of / R?^R/2 and the difference gradient computed via (VhR)^R^ since the two approximate gradients have different errors, especially in the small residual case. For any function : R^ R^ (here L = 1 or L = M), define M\s{x,h)= mp ||V’(3;)||. z£S{x,h) and E{x^ /i, 'i/j) = + h 80 We can rewrite (10) as \\Vfs{x) - Vhf{x)\\ = 0{E{x,h,cl>)). (21) Lemma 3.1 gives the analog of (21) for nonlinear least squares problems in (24) and refines (21) in (23). The error in {V hR{x))'^ R{x) is scaled by the residual norm, a fact we exploit for zero residual problems in Lemma 3.3. Lemma 3.1 Let R be given by (18). Assume that there is K > 0 such that ||$||s(.,^)<if||i?.(rr)||. ( 22 ) Then ||V/,(rc) - Vhf{x)\\ =o(h^+ (23) ||V/,(a;) - {VhR{x)fR{x)\\ = 0{\\Rsix)\\E{x,h,^)), (24) and \\R',{xf R',{x) - {VhR{x)fVHR{x)\\ = 0{E{x,K^)). (25) The constants in the 0-terms depend on the norm and the Lipschitz constant of R'. Proof. The estimate (23) follows from (10) and (19). We now prove (24). By definition, {VhR{x)fR{x) = (VhiRsix) + ^xWiRsix) + $(x)) ^{VhRs{x)VRs{x) + 0 ||i?(x)|| = Vfs{x) + 0 ||$||2 S(x,h) h = Vfsix) + Oi\\Rs{x)\\E{x,h,^)). as asserted. The proof of (25) is similar. □ Lemma 3.1 leads directly to a simple convergence result, which, for zero residual problems with only a few stencil failures, requires only that E{xn^ hn-i ^) be bounded, a weaker condition than (7). Implicit Filtering 81 Theorem 2 Let R satisfy (18) and assume that R' is Lipschitz continu- ous. Let hn Q and let {xn} he the implicit filtering sequence. Assume that all but finitely many calls to fdgauss return with stencil failure or i?(x„)|| < Thn, (26) that the model Hessians R{xn)^ R{xn) are nonsingular, and that the model Hessians and their inverses are uniformly bounded. Then if lim E{xn, hn, $) = 0 (27) n-^oo then any limit point of x^ is a critical point of f. If, moreover, all hut finitely many calls to fdgauss return with (26); then (27) can he replaced by lim /in, ^) = 0 (28) n-^oo Proof. The convergence assumption (27) requires that \\^\\s{xn,hn)/^ri 0 . In view of (19), this is equivalent to (7) if (22) holds. Hence the first assertion of the theorem is simply a restatement of Theorem 1. If the finite-difference Gauss-Newton iteration terminates all but finitely many times with (26), then \\Vfs{Xn)\\ < rhn + 0{\\Rs{Xn)\\E{Xn,hn,^)) by (24). This completes the proof. □ 3.3 Local Convergence To analyze the local convergence behavior of the IFGN iteration, we must assume that the model Hessians are well conditioned and bounded. Let X* be a local minimizer of fs{x) = RJ {x)Rs{x) for which the stan- dard assumptions for convergence of the Gauss-Newton iteration = Xc- {{V hRs{Xc))'^VhRs{Xc))~^ R's{Xc)Rs(Xc), hold (smoothness, nonsingularity of the model Hessian, sufficiently small residual) . To quantify this we will assume: Assumption 3.1 There is po > 0 such that ■ Rs is Lipschitz continuously differentiable in the set 82 ■ the Gauss-Newton model Hessian Rg{x)'^ R'g{x) and its inverse are uniformly bounded in V, and ■ there are rcN ^ (0, 1) and Cqn > 1 such that for all Xc G V, We^^W < CaN{\\ec\? + ||i?.(a;*)||||ec||) < roNWed (29) As is standard, we let e = rr — a;* for x G with the iteration index for e being inherited from the one for x. Lemma 3.2 Let R be given by (18). Let (22) and Assumption 3.1 hold and let Xc G V. Then if sup E{x^ /i, $) xev is sufficiently small, the IFGN model Hessian {V hR{xc))'^'^ hR{^c) is nonsingular. Moreover, if X+=^Xc- {{V hR{Xc)dV hR{Xc)d'^ hR{XcV R{Xc) then ||e+|| = \\e^^\\ + Oi\\Rs{xe)\\E{xc,hd))- (30) Proof. Let Xc G V. Assumption 3.1 and (25) imply that \\iR',{xcfR',{xc)d - {iVhR{xc)fVhR{xc)d\\ = 0{E{xc,hd))- (31) Now, = x^^ + EnVfsixc) + {R'{xcf R'{xc))-^Eg where Eh = {R'{xcfR'{xc))-^ - {iVhR{xc)fVhR{xc)d and Eg = Vfs(x) - {VhR{x)fR{x). Since Vfs{xc) = 0(||ec||), we apply (31) to obtain EffVfsixc) = 0{\\Rs{xc)\\E{xc,h,^)). The conclusion now follows from (22) and (24). □ Theorem 3 Let R be given by (18). Let (22) and Assumption 3.1 hold. Let xq E V. Let hn ^ 0. Assume that the implicit filtering sequence {xn} C V and that the line search fails only finitely many times. Then if (27) holds then Xn x"" . Implicit Filtering 83 3.4 Rates of Convergence To obtain rates of convergence we must make stronger assumptions on on the scales, and on the convergence rates of the Gauss-Newton iteration for the smooth problem. We must augment (29) with a lower bound that states that the Gauss-Newton iteration for Rs converges no faster than the standard Gauss-Newton convergence rate. This latter assumption is a nondegeneracy condition on i?" and is needed for the superlinear convergence results. Assumption 3.2 There are p G (0, 1] and Cp > 0 such that l|4(lc)ll < Cp||ec||"+"'- (S2) for all Xc E V. In addition to (29), CG^(llec|P + lli?.(:r*)ll||ee||)<||e«^|| (S3) for all Xc E V. Lemma 3.3 Let Assumptions 3.1 and 3.2 hold. Then if Xc is suffi- ciently near x* and C';7'||ee||'+^ <hc< (34) then there are < r < 1 and C > 1 such that C'“’'l|e+^|| < ||e+|| < C'lle^^ll < r||ec||, (35) Proof. We will show that ||i?,(a;c)||^(xc,/ic,$)=o(||e?^||) (36) for Xc near x*. The result will follow from Lemma 3.2 for Xc sufficiently near x*. Lemma 3.3 and (32) imply that E{xc,h,^) = Oi\\ec\\^+n- We consider two cases. If the smooth problem is a zero residual prob- lem (Rs{x*) = 0), then \\Rsixc)\\E{xc,hc,^) = 0{\\ecf+P)- In this case, (33) implies (36). 84 If Rs{x*) 0, then \\Rs{xc)\\E{x,,K,^) = 0{\\ecf^P)- However, in that case (33) implies that ||e+^|| > C'g^||-R,(a;*)||||ec|| and (36) holds. This completes the proof. □ In order to apply Lemma 3.3 we need to make sure that (34) holds throughout the iteration. The most direct way to do this is to update hn with an analog of (16) K+l = (37) Theorem 4 Let Assumptions 3.1 and 3.2 hold. The if xq is sufficiently neav x^ l|V/,(a:o)ir72 < ho < 2||V/,(:z;o)||('+^^/^ (38) and the implicit filtering sequence is defined by Algorithm IFGN and (37); then Xji x"" and II < l|e„+ill < <7||e™ II < r||e„||, (S9) for all n > 0. Proof. Our assumptions imply that (38) is equivalent to (34) with, for example, Ch = sup ||V^/x( 2 ;)||. xev Hence, proceeding by induction, we need only show that l|V/s(a:„)|r72 <hn< 2\\Vfs{xn)\\^^+Py^ (40) for n > 0. By (24), if hn satisfies (40), then hn+i = (||V/,(a:„+i)|| + ||i?,(a;„+i)||£;(a;„+x,h„,$))'+^ = {\\Vfs{xn+i)\\+o{\\e^^,\\)y^^ = \\Vfs{Xn+lW+P + o(||Vi?,(x„+i)||l+^’). Hence /in+i satisfies (40) for xq sufficiently near x*. □ Implicit Filtering 85 Remark: Theorem 4 says that the local convergence of IFGN is as good asymptotically as Gauss-Newton, if one counts only nonlinear iterations. For zero residual problems, one need not reduce the scales as rapidly. If we replace (34) by (41) then (35) becomes ||e+ll < Clle^^ll + 0{\\Rs{xc)\\{hl + ||eeir+^’)). (42) This will imply superlinear convergence for zero residual problems for which (22) and (32) hold if hn -> 0. The computations in § 4 illustrate this. 4. Numerical Example We report on the performance of IFGN on a parameter identification problem taken from [24, 7, 3]. Here N = 2 and M = 100. The problem is to identify the stiffness k and damping c in a harmonic oscillator so that the numerical solution of u” + cu' + ku 0'^ 1 /( 0 ) = uq^ '^^(0) = 0 best fits the data in the least squares sense. For this example the data are values of the exact solution at U i/100 for 1 < i < 100. The numerical solution was computed with the MATLAB ODE15s integrator [33]. We compare three variations of implicit filtering, IFGN with a fixed sequence of scales and an adaptive sequence that attempts to satisfy (37), and a version of the implicit filtering/BFGS algorithm from [24, 7] that has been modified to use adaptive scales. In all three we limit the optimization to a budget of 100 calls to the function. This does not mean that an iteration is terminated before completion, rather we monitor the number of function evaluations after a call to the finite difference optimizer returns and stop the optimization if the number of function evaluations has exceeded the budget after the completion of the iteration. For all the computations the initial iterate is (c, A:) = (2,3). The sequence of scales used in the examples is /iW -2-", n = 4,...,13. (43) 86 Following [7], we implement adaptive scales based on a scaled and safe- guarded form of (37), h ( 2 ) n+1 = max min %+D ||(V»„fl(xo)^iJ(xo)|| j (44) where p — 1/2 and hmin ~ 10“^. hmin is roughly the cube root of machine roundoff and is the optimal choice of h for a central difference. In the examples the line search strategy is to reduce the step by half if the sufficient decrease condition (either (11) for implicit filtering or (20) for IFGN) fails. Within both algorithms fdquasi and fdguass, amax = 10 and pmax = 100. tol=1.d-6 tol=1.d-6 Figure 2. Parameter Identification Example In Figure 4 we plot the norm of the difference gradient and the size of the function for the three variations of implicit filtering and two values, 10“^ and 10“^, of the tolerance given to ODE15s. One can see that the two variations of IFGN did substantially better than an implementation of implicit filtering that did not exploit the least squares structure. A more subtle difference, explained by the remark at the end of § 3, is that while the use adaptive scales made no visible difference in IFGN’s ability to reduce the residual (the curves overlap, indicating that the rate of convergence for both methods is equally fast, i. e. superlinear), it did Implicit Filtering 87 make the difference gradient a much better indicator of the progress of the optimization (the scales that are reduced most rapidly produce more accurate gradients). We see similar behavior for a small, but non-zero, residual problem. In Figure 2 we show the results from the parameter ID problem with uniformly distributed random numbers in the interval [0, 10“^] added to the data. The gradients behave in the same way as in the experiment with exact data, while the limiting function values reflect the non-zero residual in the high-accuracy simulation. In the low-accuracy simulation, the tolerances given to the integrator are smaller than the noise in the data, so the figures are almost identical to the one for the noise-free case. Figure 3. Parameter Identification Example; Random Noise in Data References [1] C. Audet and J. E. Dennis, Analysis of generalized pattern searches, submit- ted for publication, 2000. [2] , A pattern search filter method for nonlinear programming without deriva- tives. submitted for publication, 2000. [3] H. T. Banks and H. T. Tran, Mathematical and experimental modeling of physical processes. Department of Mathematics, North Carolina State Univer- sity, unpublished lecture notes for Mathematics 573-4, 1997. 88 [4] A. Battermann, J. M. Gablonsky, A. Patrick, C. T. Kelley, T. Coffey, K. Kavanagh, and C. T. Miller, Solution of a groundwater control problem with implicit filtering^ Optimization and Engineering, 3 (2002), pp. 189-199. [5] D. M. Bortz and C. T. Kelley, The simplex gradient and noisy optimization problems^ in Computational Methods in Optimal Design and Control, J. T. Borggaard, J. Burns, E. Cliff, and S. Schreck, eds., vol. 24 of Progress in Systems and Control Theory, Birkhauser, Boston, 1998, pp. 77-90. [6] C. G. Broyden, a new double-rank minimization algorithm, AMS Notices, 16 (1969), p. 670. [7] T. D. Choi and C. T. Kelley, Superlinear convergence and implicit filtering, SIAM J. Optim., 10 (2000), pp. 1149-1162. [8] A. R. Conn, K. Scheinberg, and P. L. Toint, On the convergence of derivative- free methods for unconstrained optimization, in Approximation The- ory and Optimization: Tributes to M. J. D. Powell, A. Iserles and M. Buhmann, eds., Cambridge, U.K., 1997, Cambridge University Press, pp. 83-108. [9] , Recent progress in unconstrained optimization without derivatives. Math. Prog. Ser. B, 79 (1997), pp. 397-414. [10] J. W. David, C. T. Kelley, and C. Y. Cheng, Use of an implicit filter- ing algorithm for mechanical system parameter identification, 1996. SAE Paper 960358, 1996 SAE International Congress and Exposition Conference Proceed- ings, Modeling of Cl and SI Engines, pp. 189-194, Society of Automotive Engi- neers, Washington, DC. [11] J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Op- timization and Nonlinear Equations, no. 16 in Classics in Applied Mathematics, SIAM, Philadelphia, 1996. [12] J. E. Dennis and V. Torczon, Direct search methods on parallel machines, SIAM J. Optim., 1 (1991), pp. 448 - 474. [13] R. Fletcher, A new approach to variable metric methods, Comput. J., 13 (1970), pp. 317-322. [14] J. M. Gablonsky, Modifications of the DIRECT Algorithm, PhD thesis. North Carolina State University, Raleigh, North Carolina, 2001. [15] J. M. Gablonsky and C. T. Kelley, A locally-biased form of the DIRECT algorithm. Journal of Global Optimization, 21 (2001), pp. 27-37. [16] P. Gilmore, An Algorithm for Optimizing Functions with Multiple Minima, PhD thesis. North Carolina State University, Raleigh, North Carolina, 1993. [17] P. Gilmore and C. T. Kelley, An implicit filtering algorithm for optimization of functions with many local minima, SIAM J. Optim., 5 (1995), pp. 269-285. [18] D. COLDFARB, A family of variable metric methods derived by variational means. Math. Comp., 24 (1970), pp. 23-26. [19] J. H. Holland, Genetic algorithms and the optimal allocation of trials, SIAM J. Comput., 2 (1973). [20] R. Hooke and T. A. Jeeves, ‘Direct search’ solution of numerical and statis- tical problems. Journal of the Association for Computing Machinery, 8 (1961), pp. 212-229. [21] D. R. Jones, The DIRECT global optimization algorithm, to appear in the Encylopedia of Optimization, 1999. Implicit Filtering 89 [22] D. R. Jones, C. C. Perttunen, and B. E. Stuckman, Lipschitzian opti- mization without the Lipschitz constant J. Optim. Theory AppL, 79 (1993), pp. 157-181. [23] C. T. Kelley, Detection and remediation of stagnation in the Nelder-Mead algorithm using a sufficient decrease condition^ SIAM J. Optim., 10 (1999), pp. 43-55. [24] , Iterative Methods for Optimization^ no. 18 in Frontiers in Applied Math- ematics, SIAM, Philadelphia, 1999. [25] S. Kirkpatrick, C. D. Geddat, and M. P. Vecchi, Optimization by simu- lated annealing^ Science, 220 (1983), pp. 671-680. [26] J. C. Lagarias, j. a. Reeds, M. H. Wright, and P. E. Wright, Conver- gence properties of the Nelder-Mead simplex algorithm in low dimensions, SIAM J. Optim., 9 (1998), pp. 112-147. [27] R. M. Lewis and V. Torczon, Rank ordering and positive bases in pattern search algorithms, Tech. Rep. 96-71, Institute for Computer Applications in Science and Engineering, December 1996. [28] S. Lucidi and M. Sciandrone, On the global convergence of derivative free methods for unconstrained optimization. Reprint, Universita di Roma “La Sapienza” , Dipartimento di Informatica e Sistemistica, 1997. [29] , A derivative- free algorithm for bound constrained optimization. Reprint, Institute di Analisi dei Sistemi ed Informatica, Consiglio Nazionale delle Richerche, 1999. [30] J. J. More, B. S. Garbow, and K. E. Hillstrom, User guide for MINPACK- 1, Tech. Rep. ANL-80-74, Argonne National Laboratory, 1980. [31] J. A. Nelder and R. Mead, A simplex method for function minimization, Comput. J., 7 (1965), pp. 308-313. [32] J. Nocedal and S. j. Wright, Numerical Optimization, Springer, New York, 1999. [33] L. F. Shampine and M. W. Reichelt, The MATLAB ODE suite, SIAM J. Sci. Comput., 18 (1997), pp. 1-22. [34] D. F. Shanno, Conditioning of quasi-Newton methods for function minimiza- tion, Math. Comp., 24 (1970), pp. 647-657. [35] M. Srinivas and L. M. Patnaik, Genetic algorithms: a survey. Computer, 27 (1994), pp. 17-27. [36] D. Stoneking, G. Bilbro, R. Trew, P. Gilmore, and C. T. Kelley, Yield optimization using a GaAs process simulator coupled to a physical device model, IEEE Transactions on Microwave Theory and Techniques, 40 (1992), pp. 1353- 1363. [37] D. E. Stoneking, G. L. Bilbro, R. J. Trew, P. Gilmore, and C. T. Kelley, Yield optimization using a GaAs process simulator coupled to a physical device model, in Proceedings lEEE/Cornell Conference on Advanced Concepts in High Speed Devices and Circuits, IEEE, 1991, pp. 374-383. [38] V. Torczon, Multidirectional Search, PhD thesis. Rice University, Houston, Texas, 1989. [39] , On the convergence of the multidimensional direct search, SIAM J. Op- tim., 1 (1991), pp. 123-145. 90 [40] , On the convergence of pattern search algorithms^ SIAM J. Optim., 7 (1997), pp. 1-25. [41] P. VAN Laarhoven and E. Aarts, Simulated annealing, theory and practice, Kluwer, Dordrecht, 1987. [42] T. A. Winslow, R. J. Trew, P. Gilmore, and C. T. Kelley, Doping profiles for optimum class B performance of GaAs mesfet amplifiers, in Proceedings lEEE/Cornell Conference on Advanced Concepts in High Speed Devices and Circuits, IEEE, 1991, pp. 188-197. [43] , Simulated performance optimization of GaAs MESFET amplifiers, in Proceedings lEEE/Cornell Conference on Advanced Concepts in High Speed Devices and Circuits, IEEE, 1991, pp. 393-402. [44] S. K. Zavriev, On the global optimization properties of finite- difference local descent algorithms, J. Global Optimization, 3 (1993), pp. 67-78. DATA MINING VIA SUPPORT VECTOR MACHINES O. L. Mangasarian Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI 53706 * olvi@cs.wisc.edu Abstract Support vector machines (SVMs) have played a key role in broad classes of problems arising in various fields. Much more recently, SVMs have become the tool of choice for problems arising in data classification and mining. This paper emphasizes some recent developments that the au- thor and his colleagues have contributed to such as: generalized SVMs (a very general mathematical programming framework for SVMs), smooth SVMs (a smooth nonlinear equation representation of SVMs solvable by a fast Newton method), Lagrangian SVMs (an unconstrained La- grangian representation of SVMs leading to an extremely simple itera- tive scheme capable of solving classification problems with millions of points) and reduced SVMs (a rectangular kernel classifier that utilizes as little as 1% of the data). 1. Introduction This paper describes four recent developments, one theoretical, three algorithmic, all centered on support vector machines (SVMs). SVMs have become the tool of choice for the fundamental classification problem of machine learning and data mining. We briefiy outline these four developments now. In Section 2 new formulations for SVMs are given as convex math- ematical programs which are often quadratic or linear programs. By setting apart the two functions of a support vector machine: separa- tion of points by a nonlinear surface in the original space of patterns, and maximizing the distance between separating planes in a higher di- *Work supported by grant CICYT TAP99-1075-C02-02. 91 92 mensional space, we are able to define indefinite, possibly discontinuous, kernels, not necessarily inner product ones, that generate highly nonlin- ear separating surfaces. Maximizing the distance between the separating planes in the higher dimensional space is surrogated by support vector suppression, which is achieved by minimizing any desired norm of sup- port vector multipliers. The norm may be one induced by the separation kernel if it happens to be positive definite, or a Euclidean or a polyhe- dral norm. The latter norm leads to a linear program whereas the former norms lead to convex quadratic programs, all with an arbitrary separa- tion kernel. A standard support vector machine can be recovered by using the same kernel for separation and support vector suppression. In Section 3 we apply smoothing methods, extensively used for solv- ing important mathematical programming problems and applications, to generate and solve an unconstrained smooth reformulation of the support vector machine for pattern classification using a completely arbitrary kernel. We term such reformulation a smooth support vector machine (SSVM). A fast Newton- Armijo algorithm for solving the SSVM con- verges globally and quadrat ically. Numerical results and comparisons demonstrate the effectiveness and speed of the algorithm. For example, on six publicly available datasets, tenfold cross validation correctness of SSVM was the highest compared with four other methods as well as the fastest. In Section 4 an implicit Lagrangian for the dual of a simple refor- mulation of the standard quadratic program of a linear support vector machine is proposed. This leads to the minimization of an unconstrained differentiable convex function in a space of dimensionality equal to the number of classified points. This problem is solvable by an extremely simple linearly convergent Lagrangian support vector machine (LSVM) algorithm. LSVM requires the inversion at the outset of a single matrix of the order of the much smaller dimensionality of the original input space plus one. The full algorithm is given in this paper in 11 lines of MATLAB code without any special optimization tools such as linear or quadratic programming solvers. This LSVM code can be used “as is” to solve classification problems with millions of points. In Section 5 an algorithm is proposed which generates a nonlinear kernel-based separating surface that requires as little as 1% of a large dataset for its explicit evaluation. To generate this nonlinear surface, the entire dataset is used as a constraint in an optimization problem with very few variables corresponding to the 1% of the data kept. The remainder of the data can be thrown away after solving the optimiza- tion problem. This is achieved by making use of a rectangular m x fh kernel K{A,A') that greatly reduces the size of the quadratic program Data Mining via Support Vector Machines 93 to be solved and simplifies the characterization of the nonlinear sepa- rating surface. Here, the m rows of A represent the original m data points while the ffi rows of A represent a greatly reduced m data points. Computational results indicate that test set correctness for the reduced support vector machine (RSVM), with a nonlinear separating surface that depends on a small randomly selected portion of the dataset, is better than that of a conventional support vector machine (SVM) with a nonlinear surface that explicitly depends on the entire dataset, and much better than a conventional SVM using a small random sample of the data. Computational times, as well as memory usage, are much smaller for RSVM than that of a conventional SVM using the entire dataset. A word about our notation. All vectors will be column vectors unless transposed to a row vector by a prime superscript '. For a vector x in the n-dimensional real space the plus function xj^ is defined as {xjiAi = max {0,a;^}, i — l,...,n, while x^ denotes the step function defined as {xA)i — 1 if > 0 and {xAji = 0 if < 0, z = 1, . . . , n. The scalar (inner) product of two vectors x and y in the n-dimensional real space RA will be denoted by x'y and the p-norm of x will be denoted by \\x\\p. If x'y = 0, we than write x ± y. For a matrix A G A{ is the ith row of A which is a row vector in i?^. A column vector of ones of arbitrary dimension will be denoted by e. For A G R^^'^ and B G RP''^\ the kernel K{A^ B) maps R!^^^ x RP'^^ into R^^K In particular, if x and y are column vectors in R^ then, K{x'^ y) is a real number, K{x' ^ A') is a row vector in R^ and AT(A, A') is an m x m matrix. If / is a real valued function defined on the n-dimensional real space R^^ the gradient of / at X is denoted by V/(x) which is a row vector in RJ^ and the n x n Hessian matrix of second partial derivatives oi f dX x is denoted by V^/(rr). The base of the natural logarithm will be denoted by e. 2. The Generalized Support Vector Machine (GSVM) [25] We consider the problem of classifying m points in the n-dimensional real space represented by the m x n matrix A, according to member- ship of each point Ai in the classes +1 or -1 as specified by a given mxm diagonal matrix D with ones or minus ones along its diagonal. For this problem the standard support vector machine with a linear kernel AA' 94 [38, 11] is given by the following for some v > min ve’y + hw'w s.t. D{Aw — e'j) +y > e y > 0. (1) Here w is the normal to the bounding planes: - 7 = +1 p, XW — ^ — — 1 , and 7 determines their location relative to the origin. The first plane above bounds the class +1 points and the second plane bounds the class A points when the two classes are strictly linearly separable, that is when the slack variable y — 0. The linear separating surface is the plane x'w = 7 , (3) midway between the bounding planes (2). See Figure 1. If the classes are linearly inseparable then the two planes bound the two classes with a “soft margin” determined by a nonnegative slack variable y, that is: x'w x'w 7 + yi > +1, for x' = Ai and Du = +1, 7 — yi ^ —Ij foi x' = Ai a,nd Du = —1. (4) The 1-norm of the slack variable y is minimized with weight u in (1). The quadratic term in (1), which is twice the reciprocal of the square of the 2-norm distance between the two bounding planes of (2) in the n-dimensional space oi w ^ EA for a fixed 7, maximizes that distance, often called the “margin”. Figure 1 depicts the points represented by A, the bounding planes (2) with margin and the separating plane (3) which separates A+, the points represented by rows of A with Du = +1, from A—^ the points represented by rows of A with Du = —1. In the GSVM formulation we attempt to discriminate between the classes +1 and -1 by a nonlinear separating surface which subsumes the linear separating surface (3), and is induced by some kernel K{A^A')^ as follows: K{x',A')Du^j, (5) where K(x'^A') G e.g. K{x',A') = x'A for the linear separating surface (3) and w — A' Du. The parameters u G and 7 G i? are determined by solving a mathematical program, typically quadratic or linear. In special cases, such as the standard SVM (13) below, u can be 95 Data Mining via Support Vector Machines x'w = 7+1 X Figure 1. The bounding planes (2) with margin and the plane (3) separat- ing A4-, the points represented by rows of A with Du = +1, from A—, the points represented by rows of A with Du = — 1. interpreted as a dual variable. A point x G is classified in class +1 or -1 according to whether the decision function (6) yields 1 or 0 respectively. Here (•)* denotes the step function defined in the Introduction. The kernel function K{x’^A^) implicitly defines a nonlinear map from x ^ to some other space z E where k may be much larger than n. In particular if the kernel K is an inner product kernel under Mercer’s condition [13, pp 138-140], [38, 11, 5] (an assumption that we will not make in this paper) then for x and y in RP^\ K{x,y) ^h{x)'h{y), (7) and the separating surface (5) becomes: h{x)'h{A')Du = 7, (8) where /i is a function, not easily computable, from R^ to i?^, and h{A') G R^^'^ results from applying h to the m columns of A' . The difficulty in computing h and the possible high dimensionality of R^ have been important factors in using a kernel K 3>s a. generator of an 96 implicit nonlinear separating surface in the original feature space but which is linear in the high dimensional space R^. Our separating surface (5) written in terms of a kernel function retains this advantage and is linear in its parameters, ^,7. We now state a mathematical program that generates such a surface for a general kernel K as follows: min ye'y + f{u) s.t. D{K{A^A’)Du — e^) y > e ( 9 ) 2/ > 0. Here / is some convex function on typically some norm or semi- norm, and V is some positive parameter that weights the separation error e'y versus suppression of the separating surface parameter u. Suppres- sion of u can be interpreted in one of two ways. We interpret it here as minimizing the number of support vectors, i.e. constraints of (9) with positive multipliers. A more conventional interpretation is that of max- imizing some measure of the distance or margin between the bounding parallel planes in i?^, under appropriate assumptions, such as / being a quadratic function induced by a positive definite kernel AT as in (13) below. As is well known, this leads to improved generalization by mini- mizing an upper bound on the VC dimension [38, 35]. We term a solution of the mathematical program (9) and the resulting decision function (6) a generalized support vector machine^ GSVM. In what follows derive a number of special cases, including the standard support vector machine. We consider first support vector machines that include the standard ones [38, 11,5] and which are obtained by setting / of (9) to be a convex quadratic function f[u) = where H G R^^'^ is some symmetric positive definite matrix. The mathematical program (9) becomes the following convex quadratic program: min i/e'y + hu'Hu s.t. D{K{A, A')Du — ej) + y > e (10) y > 0. The Wolfe dual [39, 22] of this convex quadratic program is: mm ^r'DK{A, A')DH~^DK{A, AjDr - e'r s.t. e'Dr = 0 (H) 0 < r < ue. Furthermore, the primal variable u is related to the dual variable r by: u = H~^DK{A,A'yDr, (12) Data Mining via Support Vector Machines 97 If we assume that the kernel K{A, A') is symmetric positive definite and let H — DK{A^A')D^ then our dual problem (11) degenerates to the dual problem of the standard support vector machine [38, 11, 5] with u = r: min ^u'DK(A^ A')Du — e'u ueR^ s.t. e'Du = 0 (1^) 0 <u < ue. The positive definiteness assumption on K{A^ A') in (13) can be relaxed to positive semidefiniteness while maintaining the convex quadratic pro- gram (10), with H = DK{A^A')D^ as the direct dual of (13) without utilizing (11) and (12). The symmetry and positive semidefiniteness of the kernel K{A, A') for this version of a support vector machine is con- sistent with the support vector machine literature. The fact that r = u in the dual formulation (13), shows that the variable u appearing in the original formulation (10) is also the dual multiplier vector for the first set of constraints of (10). Hence the quadratic term in the objective function of (10) can be thought of as suppressing as many multipliers of support vectors as possible and thus minimizing the number of such support vectors. This is another (nonstandard) interpretation of the standard support vector machine that is usually interpreted as maximizing the margin or distance between parallel separating planes. This leads to the idea of using other values for the matrix H other than DK{A^A')D that will also suppress u. One particular choice is interesting because it puts no restrictions on K: no symmetry, no positive definiteness or semidefiniteness and not even continuity. This is the choice H = I in (10) which leads to a dual problem (11) with H = I and u — DK{A^A')'Dr as follows: min \r'DK{A, A')KiA, A')' Dr - e'r r^Rm 2. s.t. dDr — 0 0 < r < ve. We note immediately that K{A^A')K{A^A'y is positive semidefinite with no assumptions on K{A^A')^ and hence the above problem is an always solvable convex quadratic program for any kernel K{A,A'). In fact by the Frank- Wolfe existence theorem [15], the quadratic program (10) is solvable for any symmetric positive definite matrix H because its objective function is bounded below by zero. Hence by quadratic programming duality its dual problem (11) is also solvable. Any solution of (10) can be used to generate a nonlinear decision function (6). Thus we are free to choose any symmetric positive definite matrix H to generate 98 a support vector machine. Experimentation will be needed to determine what are the most appropriate choices for H. By using the 1-norm instead of the 2-norm a linear programming formulation for the GSVM can be obtained. We refer the interested reader to [25]. We turn our attention now to an efficient method for generating SVMs based on smoothing ideas that have already been effectively used to solve various mathematical programs [7, 8, 6, 9, 10, 16, 37, 12]. 3. SSVM: Smooth Support Vector Machines [21] In our smooth approach, the square of 2-norm of the slack variable y is minimized with weight | instead of the 1-norm of y as in (1). In addition the distance between the planes (2) is measured in the (n + 1)- dimensional space of (it;, 7) G that is Measuring the margin in this (n + l)-dimensional space instead of induces strong convexity and has little or no effect on the problem as was shown in [26, 27, 21, 20]. Thus using twice the reciprocal squared of this margin instead, yields our modified SVM problem as follows: min + \(w'w + 7^) s.t. D{Aw — e^)+y > e y > 0. At the solution of problem (15), y is given by y = {e- D{Aw - 67))+, where, as defined in the Introduction, (•)+ replaces negative components of a vector by zeros. Thus, we can replace y in (15) by {e — D{Aw — ej))A^ and convert the SVM problem (15) into an equivalent SVM which is an unconstrained optimization problem as follows: min me-D{Aw-ej))+\\l + ^{w'w + j^). (17) This problem is a strongly convex minimization problem without any constraints. It is easy to show that it has a unique solution. However, the objective function in (17) is not twice differentiable which precludes the use of a fast Newton method. We thus apply the smoothing techniques of [7, 8] and replace by a very accurate smooth approximation [21, Lemma 2.1] that is given by p{x^ a), the integral of the sigmoid function of neural networks [23], that is p{x^ a) = X + ~ log(l 4- a > 0. a (15) (16) ( 18 ) Data Mining via Support Vector Machines 99 This p function with a smoothing parameter a is used here to replace the plus function of (17) to obtain a smooth support vector machine (SSVM): (19) It can be shown [21, Theorem 2.2] that the solution of problem (15) is obtained by solving problem (19) with a approaching infinity. Advan- tage can be taken of the twice differentiable property of the objective function of (19) to utilize a quadratically convergent algorithm for solv- ing the smooth support vector machine (19) as follows. Algorithm 3.1 Newton- Armijo Algorithm for SSVM (19) Start with any (tf;^,y^) G Having (t<;\ 7 ^), stop if the gradient of the objective function of (19) is zero, that is V$a(t(;% 7 ^) = 0. Else compute as follows: (i) Newton Direction: Determine direction (D G by setting equal to zero the linearization of V$a(t^;, 7 ) around which gives n + 1 linear equations in n + 1 variables: = (20) (ii) Armijo Stepsize [1]: Choose a stepsize Aj G i? such that: {w^+\f+^) = {w\Y) + Xid^ ( 21 ) where Xi — max{l, . .} such that : ^a{w\Y) - ^a{{w\Y) + \d^) > -5XiV^a{w\Y)d^ ( 22 ) where 5 G (0, ^). Note that a key difference between our smoothing approach and that of the classical SVM [38, 11] is that we are solving here a linear system of equations (20) instead of solving a quadratic program as is the case with the classical SVM. Furthermore, it can be shown [21, Theorem 3.2] that the smoothing algorithm above converges quadratically from any starting point. To obtain a nonlinear SSVM we consider the GSVM formulation (9) with a 2-norm squared error term on y instead of the 1-norm, and instead 100 of the convex term f{u) that suppresses u we use a 2-norm squared of [^] to suppress both u and 7. We obtain then: min jy'y + ^{u'u + 7^) s.t. D{K{A,A')Du-ey)+y > e (23) y > 0. We repeat the same arguments as above, in going from (15) to (19), to obtain the SSVM with a nonlinear kernel K(A^A'): min ^\\p{e- D{K{A,A')Du-e'r),a)\\l + l{u'u + -f‘^), (24) where K(A^A') is a kernel map from x to We note that this problem, which is capable of generating highly nonlinear separating surfaces, still retains the strong convexity and differentiability properties for any arbitrary kernel. All of the convergence results for a linear kernel hold here for a nonlinear kernel [21]. The effectiveness and speed of the smooth support vector machine (SSVM) approach can be demonstrated by comparing it numerically with other methods. In order to evaluate how well each algorithm gen- eralizes to future data, tenfold cross-validation is performed on each dataset [36]. To evaluate the efficacy of SSVM, computational times of SSVM were compared with robust linear program (RLP) algorithm [2], the feature selection concave minimization (FSV) algorithm, the support vector machine using the 1-norm approach (SVM||.||J and the classical support vector machine (SVM||.||2) [3, 38, 11]. All tests were run on six publicly available datasets: the Wisconsin Prognostic Breast Can- cer Database [31] and four datasets from the Irvine Machine Learning Database Repository [34]. It turned out that tenfold testing correctness of the SSVM was the highest for these five methods on all datasets tested as well as the computational speed. Detailed numerical results are given in [21]. As a test of effectiveness of the SSVM in generating a highly nonlinear separating surface, we tested it on the 1000-point checkerboard dataset of [19] depicted in Figure 2. We used the following a Gaussian kernel in the SSVM formulation (24): Gaussian Kernel : = 1,2,3... m. The value of the parameter /i used as well as values of the parameters u and a of the nonlinear SSVM (24) are all given in Figure 3 which depicts the separation obtained. Note that the boundaries of the checkerboard Data Mining via Support Vector Machines 101 are as sharp as those of [26], obtained by a linear programming solu- tion, and considerably sharper than those of [19], obtained by a Newton approach applied to a quadratic programming formulation. We turn now to an extremely simple iterative algorithm for SVMs that requires neither a quadratic program nor a linear program to be solved. 4. LSVM: Lagrangian Support Vector Machines [28] We propose here an algorithm based on an implicit Lagrangian of the dual of a simple reformulation of the standard quadratic program of a linear support vector machine. This leads to the minimization of an unconstrained differentiable convex function in a space of dimensionality equal to the number of classified points. This problem is solvable by an extremely simple linearly convergent Lagrangian support vector machine (LSVM) algorithm. LSVM requires the inversion at the outset of a single matrix of the order of the much smaller dimensionality of the original input space plus one. The full algorithm is given in this paper in 11 lines of MATLAB code without any special optimization tools such as linear or quadratic programming solvers. This LSVM code can be used “as is” to solve classification problems with millions of points. For example, 2 million points in 10 dimensional input space were classified by a linear surface in 6.7 minutes on a 250-MHz UltraSPARC II [28]. The starting point for LSVM is the primal quadratic formulation (15) of the SVM problem. Taking the dual [24] of this problem gives: min {— 4- DiAA! + ee')D)u — e'u. (25) o<ueR^2 u The variables {w^ 7) of the primal problem which determine the separat- ing surface x'w = 7 are recovered directly from the solution of the dual (25) above by the relations: u w = A’ Du^ y — 7 = —e'Du. (26) We immediately note that the matrix appearing in the dual objective function is positive definite and that there is no equality constraint and no upper bound on the dual variable u. The only constraint present is a nonnegativity one. These facts lead us to our simple iterative La- grangian SVM Algorithm which requires the inversion of a positive defi- nite (n + 1) X (n + 1) matrix, at the beginning of the algorithm followed by a straightforward linearly convergent iterative scheme that requires no optimization package. Data Mining via Support Vector Machines 103 Before stating our algorithm we define two matrices to simplify nota- tion as follows: H = D[A -e], Q = ^+ HH'. (27) With these definitions the dual problem (25) becomes min fiu) \= \u'Qu — e'u. (28) It will be understood that within the LSVM Algorithm, the single time that Q~^ is computed at the outset of the algorithm, the SMW identity [17] will be used: (I + = u{I - H{t- + H'H)-^H'), (29) where z/ is a positive number and H is an arbitrary mxk matrix. Hence only an (n + 1) x (n -h 1) matrix is inverted. The LSVM Algorithm is based directly on the Karush-Kuhn- Tucker necessary and sufficient optimality conditions [24, KTP 7.2.4, page 94] for the dual problem (28) which are the following: 0 < u _L Qu — e > 0. (30) By using the easily established identity between any two real numbers (or vectors) a and b: 0 < a J_ 6 > 0 a = {a — ab)^^ a > 0, (31) the optimality condition (30) can be written in the following equivalent form for any positive a: Qu - e = {{Qu - e) - aiz)+. (32) These optimality conditions lead to the following very simple iterative scheme which constitutes our LSVM Algorithm: — Q~^{e + {{Qu'^ - e) - au^)+), z == 0, 1, . . . , (33) for which we will establish global linear convergence from any starting point under the easily satisfiable condition: 0 < O' < ^. (34) We implement this condition as a = 1.9/i/ in all our experiments, where ly is the parameter of our SVM formulation (25). It turns out, and 104 this is the way that led us to this iterative scheme, that the optimal- ity condition (32), is also the necessary and sufficient condition for the unconstrained minimum of the implicit Lagrangian [30] associated with the dual problem (28): min Liu. a) — ueR^ = min ]-u'Qu — e'u+-^{\\{—au+Qu — e)-^\\‘^ — \\Qu — e\\‘^). (35) ueR^ 2 za Setting the gradient with respect to u of this convex and differentiable Lagrangian to zero gives {Qu - e) + —{Q - (y-I){{Q - al)u - e)^ - —Q{Qu - e) = 0, (36) a OL or equivalently: {al - Q){{Qu - e) - {{Q - al)u - e)+) = 0, (37) which is equivalent to the optimality condition (32) under the assump- tion that a is positive and not an eigenvalue of Q. In [28] global linear convergence of the iteration (33) under condition (34) is established as follows. Algorithm 4.1 LSVM Algorithm Its Global Convergence [28] Let Q e he the symmetric positive definite matrix defined by (27) and let (34) hold. Starting with an arbitrary G R^, the iterates of (33) converge to the unique solution u of (28) at the linear rate: \\Qu^+^ - Qu\\ < \\I - aQ-^\\ ■ WQu^ - Qu\\. (38) A complete MATLAB [32] code of LSVM which is capable of solving problems with millions of points using only native MATLAB commands is given below in Code 4.2. The input parameters, besides A, D and u of (27), which define the problem, are: itmax, the maximum number of iterations and tol, the tolerated nonzero error in ~R^\\ termina- tion which can be shown [28] to constitute a bound on the distance to the unique solution of the problem from the current iterate. Data Mining via Support Vector Machines Code 4.2 LSVM MATLAB Code 105 function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol) 7o Isvm with SMW for min l/2*u^ *Q*U“e ^ *u s.t. u=>0, 7„ Q=I/nu+H*H\ H=D[A -e] 7o Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma 7o [it, opt, w, gamma] = svml(A,D,nu, itmax, t ol) ; [m,n]=size(A) ;alpha=l .9/nu;e=ones(m,l) ;H=D*[A -e] ;it=0; S=H* inv ( ( speye (n+1 ) /nu+H ^ *H) ) ; u=nu* ( 1-S* (H ^ *e) ) ; oldu=u+l ; while it<itmax & norm(oldu-u)>tol z= ( 1+pl ( ( (u/nu+H* (H ^ *u) ) -alpha*u) -1) ) ; oldu=u; u=nu* (z-S* (H ^ *z) ) ; it=it+l ; end; opt =norm (u-oldu) ; w=A ^ *D*u ; gamma=-e ^ *D*u ; function pi = pl(x); pi = (abs(x)+x) /2 ; Using this MATLAB code, 2 million random points in 10- dimensional space were classified in 6.7 minutes in 6 iterations to e — 5 accuracy using a 250-MHz UltraSPARC II with 2 gigabyte memory. In contrast a linear programming formulation using CPLEX [14] ran out of memory. Other favorable numerical comparisons with other methods are contained in [28]. We turn now to our final topic of extracting very effective classifiers from a minimal portion of a large dataset. 5. RSVM: Reduced Support Vector Machines [20] In this section we describe an algorithm that generates a nonlinear kernel-based separating surface which requires as little as 1% of a large dataset for its explicit evaluation. To generate this nonlinear surface, the entire dataset is used as a constraint in an optimization problem with very few variables corresponding to the 1% of the data kept. The remainder of the data can be thrown away after solving the optimization problem. This is achieved by making use of a rectangular m x fh kernel K{A^A') that greatly reduces the size of the quadratic program to be solved and simplifies the characterization of the nonlinear separating surface. Here as before, the m rows of A represent the original m data points while the m rows of A represent a greatly reduced fh data points. Computational results indicate that test set correctness for the reduced 106 support vector machine (RSVM), with a nonlinear separating surface that depends on a small randomly selected portion of the dataset, is better than that of a conventional support vector machine (SVM) with a nonlinear surface that explicitly depends on the entire dataset, and much better than a conventional SVM using a small random sample of the data. Computational times, as well as memory usage, are much smaller for RSVM than that of a conventional SVM using the entire dataset. The motivation for RSVM comes from the practical objective of gen- erating a nonlinear separating surface (5) for a large dataset which uses only a small portion of the dataset for its characterization. The difficulty in using nonlinear kernels on large datasets is twofold. First, there is the computational difficulty in solving the the potentially huge uncon- strained optimization problem (24) which involves the kernel function K(A^A') that typically leads to the computer running out of memory even before beginning the solution process. For example, for the Adult dataset with 32562 points, which is actually solved with RSVM [20], this would mean a matrix with over one billion entries for a conven- tional SVM. The second difficulty comes from utilizing the formula (5) for the separating surface on a new unseen point x. The formula dic- tates that we store and utilize the entire data set represented by the 32562 X 123 matrix A which may be prohibitively expensive storage- wise and computing-time-wise. For example for the Adult dataset just mentioned which has an input space of 123 dimensions, this would mean that the nonlinear surface (5) requires a storage capacity for 4,005,126 numbers. To avoid all these difficulties and based on experience with chunking methods [4, 29], we hit upon the idea of using a very small random subset of the dataset given by m points of the original m data points with rh << m, that we call A and use A' in place of A' in both the unconstrained optimization problem (24), to cut problem size and computation time, and for the same purposes in evaluating the nonlinear surface (5). Note that the matrix A is left intact in AT(A, A'), whereas A' has replaced A'. Computational testing results show a standard devia- tion of 0.002 or less of test set correctness over 50 random choices for A. By contrast if both A and A' are replaced by A and A' respectively, then test set correctness declines substantially compared to RSVM, while the standard deviation, of test set correctness over 50 cases, increases more than tenfold over that of RSVM. The justification for our proposed approach is this. We use a small random A sample of our dataset as a representative sample with respect to the entire dataset A both in solving the optimization problem (24) and in evaluating the the nonlinear separating surface (5). We inter- Data Mining via Support Vector Machines 107 pret this as a possible instance-based learning [33, Chapter 8] where the small sample A is learning from the much larger training set A by form- ing the appropriate rectangular kernel relationship K{A^A') between the original and reduced sets. This formulation works extremely well computationally as evidenced by the computational results of [20]. By using the formulations described in Section 3 for the full dataset A G with a square kernel K{A^ A') G and modifying these formulations for the reduced dataset A G R^^^ with corresponding diagonal matrix D and rectangular kernel K{A^A') G we obtain our RSVM Algorithm below. This algorithm solves, by smoothing, the RSVM quadratic program obtained from (23) by replacing A' with A' as follows: min S.t. D{K{A,A')Du-e^)+y y > 0 . (39) Algorithm 5.1 RSVM Algorithm (i) Choose a random subset matrix A G R^^'^ of the original data matrix A G Typically fh is 1% to 10% of m. (a) Solve the following modified version of the SSVM (24) where A' only is replaced by A' with corresponding D C D: 2 2 which is equivalent to solving (23) with A' only replaced by A' . (Hi) The separating surface is given by (5) with A' replaced by A' as follows: k{x',A’)Du = ^, (41) where (u,7) G R^^^ is the unique solution of (40), and x E RT is a free input space variable of a new point. (iv) A new input point x G R^ is classified into class +1 or —1 depend- ing on whether the step function: {K{x',A')Du-j),, ( 42 ) is +1 6>r zero, respectively. 108 As stated earlier, this algorithm is quite insensitive as to which subma- trix A is chosen for (40)-(41), as far as tenfold cross-validation correct- ness is concerned. In fact, another choice for A is to choose it randomly but only keep rows that are more than a certain minimal distance apart. This leads to a slight improvement in testing correctness but increases computational time somewhat. Replacing both A and A' in a conven- tional SVM by a randomly chosen reduced matrix A and its transpose A' gives poor testing set results that vary significantly with the choice of A. This fact can be demonstrated graphically as follows. The checkerboard dataset [18, 19] already used earlier, consists of 1000 points in of black and white points taken from sixteen black and white squares of a checkerboard. This dataset is chosen in order to depict graphically the effectiveness of RSVM using a random 5% of the given 1000-point training dataset compared to the very poor perfor- mance of a conventional SVM on the same 5% randomly chosen subset. Figure 4 shows the poor pattern approximating a checkerboard obtained by a conventional SVM using a Gaussian kernel, that is solving (23) with both A and A' replaced by the randomly chosen A and its transpose A' respectively. Test set correctness of this conventional SVM using the re- duced A and A' averaged, over 15 cases, 43.60% for the 50-point dataset, on a test set of 39601 points. In contrast, using our RSVM Algorithm 4.1 on the same randomly chosen submatrices A', yields the much more accurate representations of the checkerboard depicted in Figures 5 with corresponding average test set correctness of 96.70% on the same test set. 6. Conclusion and Extensions We have described the important role of support vector machines in solving the key problem of classification that arises in data mining and machine learning. In particular we have described a general framework for support vector machines and given three highly effective algorithms for generating linear and nonlinear classifiers. In all our results mathe- matical programming plays key theoretical and algorithmic roles. Some extensions of the these ideas include multicategory classification, classifi- cation based on criteria other than belonging to a halfspace, incremental classification of massive streaming datasets, concurrent feature and data selection for optimal classification, classification based on minimal data subsets and multiple instance classification. Data Mining via Support Vector Machines 109 Figure 4- SVM: Checkerboard resulting from a randomly selected 50 points, out of a 1000-point dataset, and used in a conventional Gaussian kernel SVM (23). The resulting nonlinear surface, separating white and black areas, generated using the 50 random points only, depends explicitly on those points only. Correctness on a 39601- point test set averaged 43.60% on 15 randomly chosen 50-point sets, with a standard deviation of 0.0895 and best correctness of 61.03% depicted above. Figure 5. RSVM: Checkerboard resulting from randomly selected 50 points and used in a reduced Gaussian kernel SVM (39). The resulting nonlinear surface, sepa- rating white and black areas, generated using the entire 1000-point dataset, depends explicitly on the 50 points only. The remaining 950 points can be thrown away once the separating surface has been generated. Correctness on a 39601-point test set averaged 96.7% on 15 randomly chosen 50-point sets, with a standard deviation of 0.0082 and best correctness of 98.04% depicted above. 110 Acknowledgements The research described in this Data Mining Institute Report 01-05, May 2001, was supported by National Science Foundation Grants CCR- 9729842 and CDA-9623632, by Air Force Office of Scientific Research Grant F49620-00- 1-0085 and by the Microsoft Corporation. References [1] L. Armijo. Minimization of functions having Lipschitz-continuous first partial derivatives. Pacific Journal of Mathematics, 16:1-3, 1966. [2] K. P. Bennett and O. L. Mangasarian. Robust linear programming discrim- ination of two linearly inseparable sets. Optimization Methods and Software, 1:23-34, 1992. [3] P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimiza- tion and support vector machines. In J. Shavlik, editor. Machine Learning Proceedings of the Fifteenth International Conference (I CML ’98), pages 82-90, San Francisco, California, 1998. Morgan Kaufmann. ftp://ftp.cs.wisc.edu/math- prog/tech-reports/98-03.ps. [4] P. S. Bradley and O. L. Mangasarian. Massive data discrimination via linear support vector machines. Optimization Methods and Software, 13:1-10, 2000. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps. [5] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167, 1998. [6] B. Chen and P. T. Barker. Smooth approximations to nonlinear complementar- ity problems. SIAM Journal of Optimization, 7:403-420, 1997. [7] Chunhui Chen and O. L. Mangasarian. Smoothing methods for convex in- equalities and linear complementarity problems. Mathematical Programming, 71(l):51-69, 1995. [8] Chunhui Chen and O. L. Mangasarian. A class of smoothing functions for nonlinear and mixed complementarity problems. Computational Optimization and Applications, 5(2):97-138, 1996. [9] X. Chen, L. Qi, and D. Sun. Global and superlinear convergence of the smooth- ing Newton method and its application to general box constrained variational inequalities. Mathematics of Computation, 67:519-540, 1998. [10] X. Chen and Y. Ye. On homotopy-smoothing methods for variational inequali- ties. SIAM Journal on Control and Optimization, 37:589-616, 1999. [11] V. Cherkassky and F. Mulier. Learning from Data - Concepts, Theory and Methods. John Wiley &: Sons, New York, 1998. [12] P. W. Christensen and J.-S. Pang. Frictional contact algorithms based on semismooth Newton methods. In Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, M. Fukushima and L. Qi, (editors), pages 81-116, Dordrecht, Netherlands, 1999. Kluwer Academic Publishers. [13] R. Courant and D. Hilbert. Methods of Mathematical Physics. Interscience Publishers, New York, 1953. Data Mining via Support Vector Machines 111 [14] CPLEX Optimization Inc., Incline Village, Nevada. Using the CPLEX(TM) Lin- ear Optimizer and CPLEX(TM) Mixed Integer Optimizer (Version 2.0), 1992. [15] M. Prank and P. Wolfe. An algorithm for quadratic programming. Naval Re- search Logistics Quarterly, 3:95-110, 1956. [16] M. Fukushimaand L. Qi. Reformulation: Nonsmooth, Piecewise Smooth, Semis- mooth and Smoothing Methods. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999. [17] G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins University Press, Baltimore, Maryland, 3rd edition, 1996. [18] T. K. Ho and E. M. Kleinberg. Building project able classifiers of arbi- trary complexity. In Proceedings of the 13th International Conference on Pattern Recognition, pages 880-885, Vienna, Austria, 1996. http://cm.bell- labs.com/who/tkh/pubs.html. Checker dataset at: ftp://ftp.cs.wisc.edu/math- prog/cpo-dataset /machine-learn /checker. [19] L. Kaufman. Solving the quadratic programming problem arising in support vector classification. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors. Advances in Kernel Methods - Support Vector Learning, pages 147-167. MIT Press, 1999. [20] Y.-J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. Technical Report 00-07, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, July 2000. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7, 2001, CD-ROM Proceedings, ftp: //ftp. cs.wisc.edu/pub/dmi/tech-reports/ 00-07.ps. [21] Yuh-Jye Lee and O. L. Mangasarian. SSVM: A smooth support vec- tor machine. Computational Optimization and Applications, 20:5-22, 2001. Data Mining Institute, University of Wisconsin, Technical Report 99-03. ft p : / /ft p . cs . wise . edu / pub / dmi / tech-reports / 99-03 . ps . [22] O. L. Mangasarian. Nonlinear Programming. McGraw-Hill, New York, 1969. Reprint: SIAM Classic in Applied Mathematics 10, 1994, Philadelphia. [23] O. L. Mangasarian. Mathematical programming in neural networks. ORSA Journal on Computing, 5(4):349-360, 1993. [24] O. L. Mangasarian. Nonlinear Programming. SIAM, Philadelphia, PA, 1994. [25] O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors. Advances in Large Margin Classifiers, pages 135-146, Cambridge, MA, 2000. MIT Press, ft p : / /ft p . cs . wise . edu / math-prog / tech-reports / 98- 1 4 . ps . [26] O. L. Mangasarian and D. R. Musicant. Successive overrelaxation for support vector machines. IEEE Transactions on Neural Networks, 10:1032-1037, 1999. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-18.ps. [27] O. L. Mangasarian and D. R. Musicant. Data discrimination via nonlinear generalized support vector machines. In M. C. Ferris, O. L. Mangasarian, and J.-S. Pang, editors. Complementarity: Applications, Algorithms and Exten- sions, pages 233-251, Dordrecht, January 2001. Kluwer Academic Publishers. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/99-03.ps. [28] O. L. Mangasarian and D. R. Musicant. Lagrangian support vec- tor machines. Journal of Machine Learning Research, 1:161-177, 2001. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/00-06.ps. 112 [29] O. L. Mangasarian and D. R. Musicant. Large scale kernel re- gression via linear programming. Machine Learning^ 46:255-269, 2002. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-02.ps. [30] O. L. Mangasarian and M. V. Solodov. Nonlinear complementarity as uncon- strained and constrained minimization. Mathematical Programming, Series B, 62:277-297, 1993. [31] O. L. Mangasarian, W. N. Street, and W. H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4):570-577, July-August 1995. [32] MATLAB. User’s Guide. The MathWorks, Inc., Natick, MA 01760, 1994-2001. http://www.mathworks.com. [33] T. M. Mitchell. Machine Learning. McGraw-Hill, Boston, 1997. [34] P. M. Murphy and D. W. Aha. UCI machine learning repository, 1992. WWW. ics . uci . edu/ ^mlear n / MLRep osit ory. html . [35] B. Scholkopf. Support Vector Learning. R. Oldenbourg Verlag, Munich, 1997. [36] M. Stone. Cross- validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36:111-147, 1974. [37] P. Tseng. Analysis of a non-interior continuation method based on Chen- Mangasarian smoothing functions for complementarity problems. In Reformu- lation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, M. Fukushima and L. Qi, (editors), pages 381-404, Dordrecht, Netherlands, 1999. Kluwer Academic Publishers. [38] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995. [39] P. Wolfe. A duality theorem for nonlinear programming. Quarterly of Applied Mathematics, 19:239-244, 1961. PROPERTIES OF OLIGOPOLISTIC MARKET EQUILIBRIA IN LINEARIZED DC POWER NETWORKS WITH ARBITRAGE AND SUPPLY FUNCTION CONJECTURES Jong-Shi Pang Department of Mathematical Sciences The Johns Hopkins University Baltimore, Maryland 21218-2682, U.S.A. jsp@vicpl.mts.jhu.edu Benjamin F. Hobbs Department of Geography and Environmental Engineering The Johns Hopkins University Baltimore, Maryland 21218-2682, U.S.A. bhobbs@jhu.edu Christopher J. Day Enron Europe Limited 40 Grosvenor Place London SWIX JEN, United Kingdom christopher.j.day@enron.com Abstract We present mathematical models for a power market on a linearized DC network with affine demand. The models represent the conjecture that each power generating company may hold regarding how rival firms will change their outputs if prices change. The classic Cournot model is a special case of this conjecture. The models differ in how arbitrage is handled, and their formulations give rise to nonlinear mixed complemen- tarity problems. In the Stackelberg version, the generators anticipate how arbitrage would affect prices at different locations, and therefore treat the arbitrage amounts as decision variables in their profit maxi- mization problems. In the other version, arbitrage is exogenous to the 113 114 firms. We show that solutions to the latter model are also solutions to the Stackelberg model. We also demonstrate existence and uniqueness properties for the exogenous arbitrage model. 1. Introduction In restructured power markets, electric power generators have been privatized or freed of regulatory constraints on prices. The intent of restructuring is to provide incentives for innovation and more efficient production and consumption of electricity [5]. However, because of mar- ket failures, these benefits may not be fully realized. A market failure that has been of particular concern to regulators and the public is market power [12]. Market power is defined as the ability of a market partic- ipant to unilaterally alter prices in its own favor, and to sustain those price changes. Transmission capacity limits that restrict power imports and exports are an important source of market power for generating companies, as they allow firms within an isolated region to raise prices above competitive levels [2]. The potential for market power to be exercised within a given power system can be studied through laboratory experiments, empirical anal- ysis, and modeling. There are many models of strategic interaction in transmission constrained systems (for reviews, see [7] or [8]). Models can be used to unveil unanticipated ways in which market power might be exercised on networks, to identify locations where prices can be manip- ulated, to assess the effects of adding transmission capacity upon prices, and to examine the competitive effects of company mergers or divest- ments. The most common oligopolistic modeling frameworks employed in power market analyses are based on the ideas of Cournot games and Supply Function Equilibria (SFE), defined below. The purpose of this paper is to analyze the existence and uniqueness properties of solutions of a new model of oligopolistic power genera- tors. The model represents the power network using a linearized “DC” load flow model [13], and includes a flexible representation of interac- tions of competing generating firms. We term this representation the “conjectured supply functions” (CSFs) approach. A CSF is a function representing the beliefs of a firm concerning how total supply from rival firms will react to price. Two versions of a linear CSF have been pro- posed: one in which the slope of conjectured supply response is constant and the intercept is to be solved for, and another in which the intercept is given but the slope is to be determined. The former CSP yields a lin- ear mixed complementarity problem (MCP) for the market equilibrium, while the latter gives a nonlinear MCP. Oligopolistic Market Equilibria in Linearized DC Power Networks 115 The CSF model can be viewed as a generalization of the Cournot models of [7] and [14] in that the amount that rival firms are anticipated to adjust prices in response to a price change is not restricted to zero (the Cournot assumption). Instead, each generating company is allowed to conjecture that rival firms will react to price increases or decreases. By making different assumptions about the assumed supply response, different degrees of competitive intensity can be modeled, ranging from pure competition (infinitely large positive response by rivals to price increases) to oligopolistic Cournot competition (no response). Positive sloped CSFs represent a competitive intensity between the Cournot and pure competition extremes. A detailed justification of the CSF approach to modeling competition on transmission networks is given in [4], along with an application to the United Kingdom power system. It should be noted that the CSF modeling approach is distinct from the widely used supply function equilibrium (SFE) approach to market modeling [1, 9]. The SFE is a Nash game in bid functions, in which suppliers provide a function to a central auctioneer that relates their willingness to supply to the price. The SFE approach also yields prices intermediate between the pure competition and Cournot extremes, but is plagued by computational challenges along with problems of nonunique- ness and, in some cases, nonexistence of solutions [2]. The fundamental difference between the SFE and CSF approaches is that the anticipated supply response of competitors is endogenous in SFE models and is consistent with the competitor’s actual bid function, while in the CSF approach, the conjectured supply response of competing firms is instead based on an assumed parameter (slope or intercept). It is this difference that allows SFEs to be formulated as mixed complementarity problems that are relatively easy to solve and yield solutions whose existence and uniqueness properties can be demonstrated. Questions concerning the existence and uniqueness of equilibrium so- lutions to market models are important for two reasons. First, public policy is in part based on policy analyses using market models; if unique solutions cannot be assured, then the question arises as to whether the conclusions of an analysis depend on which of several possible solutions is selected. Second, if a solution exists and is unique, then computational procedures do not need to check for multiple solutions, and are therefore simpler. This paper focuses on the existence and uniqueness properties of the solution of the nonlinear MCP (fixed intercept model), as those properties for the linear MCP (fixed slope) are readily established using the results of [11]. The paper begins by defining notation and the profit maximizing problems that are common to all the models presented in this paper 116 (Section 2). Those common problems include the profit maximization problems for the independent system operator (ISO) who allocates scarce transmission capacity, and the arbitrager who eliminates any noncost- based price differences among nodes in the network. Consumers are represented by downward sloping demand curves. The various models introduced in the paper differ in terms of their representation of the profit maximization problem for the oligopolistic power producer. The first model, Model I, is introduced in Section 3. There, the power pro- ducer makes production and sales decisions recognizing that demand responds to price, that rival producers will react to price changes (ac- cording to the assumed CSF), and that noncost-based price differences will be arbitraged away. Inclusion of arbitrage means that the arbi- trager’s equilibrium conditions are introduced as constraints in the pro- ducer’s constraint set. After introducing the producer profit maximiza- tion problem, we obtain the nonlinear MCP that represents the market equilibrium. Section 4 presents Model II which differs from Model I in that the arbitrager’s equilibrium conditions are kept outside of the producer’s problem, resulting in a model which can be analyzed more fully than Model I. In Section 5, relevant theory of monotone linear complementarity problems is introduced which will be the basis for the demonstrations of the model properties. This theory is used in Section 6 to establish the existence of solutions to Models I and II, the conditions under which solutions to Model II exist and certain of the variables (prices, total generation, sales, and profits) are unique. 2. The ISO and Arbitrage Models In this and the next two sections, we present the mixed NCP formula- tions of the market equilibrium with conjectured supply functions. The resulting models become the respective linear complementarity models considered in [11] when the intercepts tend to minus infinity. In what follows, we present the NCP models, establish the existence of solutions and analyze their properties. 2.1 Notation Before presenting the mathematical formulations for the models, we summarize the notation. Parameters J\f : set of nodes, excluding the hub A : set of transmission elements in the full network T : set of firms cxi : fixed intercept of supply function at node i Cfi : cost per unit generation at node i by firm / Oligopolistic Market Equilibria in Linearized DC Power Networks 117 P? : Q?: TL • CAPfi PDF.fc price intercept of supply function at node i quantity intercept of supply function at node i capacity on transmission element k capacity in the reverse direction of transmission element k production capacity at node i for firm / power distribution factor for node i on element A;, describing the megawatt (MW) increase in flow resulting from 1 MW of power injection at i and 1 MW of withdrawal at a hub node. Variables Sfi : amount of sales at node i by firm / Qfi : generation at node i by firm / Pfi ; price at node i anticipated by firm / yi ; amount of transmission service from hub H to node i Wi : transmission price from hub H to node i Pf : price at the hub node, anticipated by firm / afi : amount that arbitragers sell at node anticipated by firm / : dual variables of transmission capacity constraints in ISO’s problem 7 /i : dual variable of production capacity constraint in firm /’s problem p f : dual variable of balance equation between supply and generation in firm /’s problem 7Ti : market price at node i Vectors Sz Matrices 1 : vector of ones of appropriate size I: identity matrix of appropriate order E: square matrix of ones of appropriate order II: |A/*| X 1^1 matrix of PDFijt, i E J\f and k E A s : (|A/*| X |^|)-vector oi Sfi, i E Af and f E P g: (|A/'j X |Vj)-vector of gfi, i E Af and f E P 7T : |A/*|-vector of equilibrium prices tu, i E Af 7 : (|A/*| X |.F|)-vector of 7 /i, i E AT and f E P |^|-vectors oi ^ k E A c: |V| X 1^1)- vector oi Cfi^ i E Af and f E P CAP: (|A^| X |.F|)- vector of CAP/i, i E Af and f E P. The components of the vectors 5, g, c, and CAP are grouped by firms; that is S = (Sl, . . . ,S|yr|)^, where each 5 / is the |A/’|-vector with components Sfi^ i E Af. The other three vectors c, and CAP are similarly arranged. Except for the supply intercepts and some power distribution factors, all parameters of the models are positive. 118 2.2 The ISO’s problem The ISO’s problem is the following linear program (LP). Given the transmission prices Wi^ i ^ A/", compute z G A/* in order to maximize E m Vi ieM subject to ^2 Pi = 0, iv) ieAf E < VA; e A, (V) ieAf E PDFjfcj/j IV 1 'ik € A, ( )’ ieJ\f where we write the dual variables in parentheses next to the correspond- ing constraints. Note that the variables yi are unrestricted in sign. A positive (negative) yi means that there is a net flow into (out of) node i. It is trivial to note that y = 0 is always a feasible solution to the above LP, because the are positive scalars. The optimality conditions of the LP can be written as a mixed LCP in the variables yi for i G A/*, iov k ^ A and y, parameterized by the transmission fees wiA ^ A/*: 0 < ± ^ PDFifc yj > 0, k e A, ieM 0 < 1 T+ - ^ PBFikVi >0, k £ A, ieM 0 = E 2^*’ ieAf 0 = Wi + ^ PDF^^t i^k ~ i G A/* . keA ( 2 ) 2.3 The arbitrager’s problem The arbitrager maximizes its profit by buying and selling power in the market, given the prices at the nodes in the network. With ai denoting the arbitrage amount sold at node z, the arbitrager’s profit maximization problem is very simple: for fixed prices pi and costs compute a^, i E Af in order to maximize E {pi -Wi)ai ieAf subject to ^2 — 0? 5 ieAf ( 3 ) Oligopolistic Market Equilibria in Linearized DC Power Networks 119 the transmission fee at node i is included in the objective function be- cause the arbitrager must also pay this cost. The arbitrage amounts are measured as the net sales at a node; thus the sum of all the arbitrage amounts must equal to zero. Note that ai is unrestricted in sign. A positive ai represents the amount sold by the arbitrager at node i; in this case, the arbitrager is receiving pi for each unit sold but is paying Wi for the transmission. If ai is negative, then \ai\ is the quantity that the arbitrager bought from node i; in this case, the arbitrager is paying Pi per unit and paying —Wi per unit to ship out of i. The problem (3) is trivially solvable. In particular, this problem is equivalent to the two equations: Pi- Wi- PH = 0, V j € V = 0 . ieN In turn, the first equation implies Pi -Pj = Wi- Wj, Vi,j G V, which says that the difference in prices at two distinct nodes is exactly the difference between the transmission fees at those two nodes. 3- Model I In this model, each firm that produces power anticipates the arbitrage amounts by including the variables afi and a supply function conjecture with fixed intercept in its profit maximization problem. The constraints that these variables satisfy are basically (4), where pi is determined by the price function: pr = F» - PI> - i'. QJ ^ti + teE (Note the addition of the subscript i in pfi as this is now the price at i anticipated by /.) The supply function conjecture is expressed by the equation ^-fi — — t^f Pfi ^i * TTi - ai Note that tt^ is a base price at which S-fi = and is exogenous to the firms. Substituting S-fi into the former equation and simplifying, we obtain Pfi — ( Qi ^ fi ^fi 120 Letting pf he the firm’s anticipated price at the hub, firm /’s problem is: with and i E J\f fixed, find Sfi^ gfi^ afi^ pfi for i G A/", and py in order to maximize {pfi ~ m)sfi - Cfi - Wi ) gfi ieAf i&M subject to Qfi < CAP/j, \/i E Af ( ^fi 9fi ) ~ 0) Pfi = Qi ^fi ^fi + w Oii 'Ki OLi -fi _+ -Ji. PP TTj — ai \fi e Af Pfi = Pf + Wi, \/i e Af ieAf Sfi, 9fi > h, Vi G M. The three equations Pfi — ( Qi ^fi ^fi + aj TTn CXj Mi E M Pfi = Pf + Wi, Mi E Af T. ^fi ~ ieAf uniquely determine pfi and ay^ for i E Af and py in terms of Stj, wj and TTj for alH G ^ and j E Af. For the purpose of restating firm /’s maximization problem, it suffices to solve for py, obtaining. • 5 _ iej\f ieM \^i ’^i-^i) ^ OLi Pf -fi V 91 + \Pi Oligopolistic Market Equilibria in Linearized DC Power Networks 121 Let p/(5, 7T, w) denote the fraction on the right-hand side as a function of the vectors s, tt, and w. This function depends on the intercepts ai] but since these are parameters of the model, we do not write them as the arguments of pf. The function pf also depends on and tt^; at equilibrium, the latter variables will be equated with S-fi and pfi^ respectively. Observe that p/ is a linear function of s/, with the other arguments fixed. We can now restate firm /’s problem in the simplified form: with su [t ^ /), TTi, and Wi^ i ^ J\f fixed, find Sfi and pfi for i G A/* in order to maximize H ( ^fi ~ ) 9fi ieN ieN subject to pfi < CAP fi, Vi G Af, (7/0 ( ^fi 9fi ) — O 5 (Pf) ieN ^fi') 9fi ^ O 5 Vi G A/*. The above problem is a quadratic concave maximization problem in the variables sji and pfi for i G A/", parameterized by su for t ^ f and tt^- and Wi for i ^ J\f. We can write the optimality conditions for the problem as follows: f) < Sfi 1. -p/(5,7T,U;) + ^}3 + 'Z’z > 0, i e V, 0 < 5/i -1- Cfi -Wi+ 7/j - ¥>/ > 0, i £ M, 0 < jfi -i CAP fi - Qfi > 0, i e Af, 0 — 'E/ ( ^fi 9fi )• ieM To complete the description of the model, we need to relate the ISO’s problem to the firms’ problems. This is accomplished via the market clearing condition, which is simply a flow balancing equation: Vi ~ ^ ^ ( ^ti 9ti ) T ^(/)0 ^ r X Af . (5) In addition, we stipulate that Pf{s,'K,w) +Wi ~ TTi e TxAf ( 6 ) 122 and s*_j:^ = s-fi for all (/, i) E !FxJ\f. From the definition ofpf{s, tt, w), the last two conditions yield feT ieAf ieAf ieAf * which expresses the total sales S of all firms in all markets in terms of the market prices tt^. Substituting (6) into the last equation, we obtain Pf{s, 7T,w) = (l] Qi- 1] 1^,"] \ieAf ieAf * / / \ieAf * / which shows among other things that p/(5, tt, m) is the same for all firms and Pfi = Pf{s,7Z,w) +Wi = TTf, thus the price pfi is independent of the firms. Substituting the equality Pfi = into the expression Pfi — ( Qi ^fi (^fi + a. TT'j cy.j ^-fi Qi I TTi - ai ( 7 ) yields (^fi — Qi ~ '^i ~ 2^ ^ti") teT which shows that the arbitrage amounts anticipated by the firms depend only on the market i. Substituting the expression 7TV \jeM jeAf i / / \je^^ i = a +Yf, [5ij - pi)wj - (jj S jeM into the above expression for we obtain n/i = Ri — ^2 ^ti + Pi S — ^2 Ci: teT jeAf where 1 UJ = CT = ieM pO p9 ieAf * 91 pP‘ Pi ^ ^ Q?-$a, iEM, Oligopolistic Market Equilibria in Linearized DC Power Networks 123 and with 5ij denoting the Kronecker delta; i.e., Sij is equal to one Hi = j and equal to zero if i 7 ^ j, ^ij ~ ~n0 ( ~ Pj)^ ^ # ifi^J pO pO i 3 if i^j. Substituting the above expression for afi into (5), we obtain Vi = Ri 9ti + Pi s - Cy Wj. (8) jeN Proceeding as in [11], we can show that the resulting Model I is to compute {Afc : k e A}, {sfi,gfi,jfi : i e Af, f e T}, and {cpf : f e 0 < -L + ^2 ^2 Cij PDFjf (A^ ieA ijeN Y^PiPDFik] S-^Y.^BFik9fi > 0 , ykeA, ieA/' / 0 < A+ ± 9 + + ^ PDFifc Cv PDF,-, ( A+ - A 7 i£A i,j^N ^PiPDFifc > 0, VkeA, ieN / ieMfeP J 2 Vi 0 ^fi D — (T + ca S' H 7~r^~ ^ ~F P'fk E S + ^ V n V - «i ^ ^PjFDFjk (A+-Aj^) > 0, V(/,i)€J^xV, keA jeN 124 0 < 5/j J- Cfi + ^PDF,,(A^-A+)+ keA 7if -v'f > 0, € T X J\f, 0 < 7/i -L CAP fi - Qfi > 0, \/{f,i)eTxAf, 0=^ i^fi-9fi), V/ e :r, ieM where VA:€A iGA/" In vector form, we have, q: T n^sp°. We observe that -Ki = a -ujS+^'^{5ij-pj) PDFjfc {\t -\) jeAfkeA po = ''-"S+74EEfePDf',t(A+-At), jeAfkeA which expresses the regional prices tt^ in terms of the total sales S and the dual variables of the transmission capacity constraints. Subsequently, we show that TTi is uniquely determined by the total sales only. Let hi : 5jl-^|x|*^l be defined by E VI jeAf V Pj V - L|AT|, V/ e where tt^ is given by (9). We can now write Model I in vector-matrix form. First we assemble the variables of the system in two vectors: X (A s e Sft( 2 MI+ 3 |AA|x|^|) ^ g Ui Oligopolistic Market Equilibria in Linearized DC Power Networks 125 (For notational simplification, we drop the ' in the variable (p). Next, we define the \A\ x \A\ symmetric positive semidefinite matrix A = n^sn; also define two matrices in partitioned form: oli — where Ma = Mxs = ■ Ma Mxs Ma, 0 ■ 0 Ms 0 0 , N = J -{MxgV 0 0 I -J 0 0 -I 0 _ 0 A -A -A A lA^i nVi^i ... -n^pi|5}| g S)f{2|^|x(|A/|x|;r|)^ Ma„ = -n^p -n^ . . E E E E Me = uj 1 ^ -n^ .. E .. E g SR2|^|x(K|x|^|) e J{(|AA|X|^|)X(|A/|X|^|) E E ... E (with each E being the |A/”| x |A/"| matrix of all ones) 1|A/-| 0 ... 0 0 L|AA| 0 L|A/-| gj(|AT|x|^|)x|^|_ 126 The matrix Af oii is square and of order (2|.4.| + 3|A/’| x | J^|); the matrix N is rectangular and of order {2\A\ + 3|A/*| x \ J^\) by \T\. Define a constant vector partitioned in accordance with Mqu: ( r \ 9oli = 1|AA|x|:f| e Sft(2m+3|Ar|x|^|)_ CAP / With the above vectors and matrices, Model I can now be stated as the following mixed NCP: 0 < X -L goii + MoiiX + N(p + 0 = N^x. ( 0 \ 0 hi{s,X^) 0 V 0 ) > 0 ( 10 ) If not for the nonlinear function hi{s, A^), Model I would be a linear complementarity problem (LCP), which is exactly the one treated in [11]. The existence of a solution to (10) relies on bounding the components hij(s, A^) for all f E T. In turn, this relies on bounding the prices for i E Af. In Section 6, we show how to obtain the necessary bounds via LCP theory. 4. Model II In this model, each firm takes the arbitrage amounts as input param- eters in its profit maximization problem. Specifically, with the price pi given by (7), firm /’s problem is: with 7Tj fixed for Oligopolistic Market Equilibria in Linearized DC Power Networks all i G A/", find Sfi and gji for alH G A/* in order to 127 Qi s fi di + E TXi — OL no ^5 ieM ^ M V Pf ■Ki- ai 5* ieN subject to Y^(sfi-gfi) = 0 ieN and Sfi > 0^ 0 < gfi < CAPj^, V(/, z) G TxM. Model II is complete with the inclusion of the ISO’s problem plus the arbitrage constraint: Qi Sfi di + H * s_ -Wi-PH = Q, V * € A/', ^ I J ^ PP 7t,: — ai ^ Oj = 0, i£N the flow balancing equation (5): Vi — ^ 9ti ) "b ^fii ^(/)0 ^ P ^ A/", and the price equation •TTj = -Wj +p/, yi e Af. Following a similar derivation as before, we can show that Model II can be formulated as the following NCP: ( ° 0 0 < a? _L M q\{X N( f /in(s,A^) 0 V 0 0 = N^x, 128 where with Tr* given by (9), hu : s)[j|:F|x|A/'| jg gjygj^ ] 3 y h„ji(s,A±) = I Pf TTi - at where tt^ is given by (9). The two NCPs (10) and (12) differ in the two closely related nonlinear functions hi and hu. 5. Complementarity Theory The key to the analysis of Models I and II is the theory of monotone LCPs. This theory in turn yields an existence result of a special vari- ational inequality that is the cornerstone for the existence of solutions to the supply-function based market models. In this section, we present the prerequisite LCP theory and its implications. We begin by recalling that the LCP range of a matrix M G denoted TZ{M)^ is the set of all vectors g G for which the LCP (qr, M) has a solution. Our first result pertains to the solutions of an LCP defined by a symmetric positive semidefinite matrix. Although part (a) of this result is known and parts (b) and (c) hold in more general contexts (see [6]) we give a full treatment of the following theorem because it is the basis for the entire subsequent development. Theorem 1 Let Af G be a symmetric positive semidefinite ma- trix. (a) For every q G TZ{M)^ the solutions of the LCP (g, M) are u;-unique; that is, if and are any two solutions of the LCP (g, JW), then Mz^ = Mz‘^. Let w{q) denote the common vector q + Mz for any solution of the LCP (q^M). (b) There exists a constant c > 0 such that \\w{q)\\ < c\\q\l \/q G 7^(M). (c) The function w : TZ{M) 3?^ is continuous. Proof. Statement (a) is a well-known result in LCP theory. We next prove (b) by contradiction. Suppose no constant c satisfying (b) exists. There exists a sequence of vectors {q^} C TZ{M) satisfying IM9")II > k\\q>^\\ for every k. We have w{q^) ^ 0 for every k and lim -n — 7 — rv 77 = 0. k-^oo II w{q^) II Oligopolistic Market Equilibria in Linearized DC Power Networks 129 Without loss of generality, we may assume that lim k-^oo w{q^) w{q^) V oo for some vector v°°, which must be nonzero. We may further assume that supp(u;(g^)) = {i : Wi{q'^) >0} is the same for all /j, which we denote a. With a denoting the comple- ment of Of in {1, . . . , n}, we have, for every A;, 0 = g^ + Maa4 Wa{q^) = q'^ + MaaZ^ > 0 for some vector > 0. Dividing by ||u;(qf^)||, we deduce the existence of a nonnegative vector z^, which is not necessarily related to the sequence such that 0 = Ma&z'^ = Ma&zf. Since M is symmetric positive semidefinite, the above implies thus v°°, is equal to zero. This contradiction establishes part (b). To prove part (c), let {q'^} C TZ{M) converge to a limit vector q°° which must necessarily belong to TZ{M) because the LCP range is a closed cone. For each k, let € SOL{q'^,M) be such that w{q^) = q^ + Mz^. The sequence {w(g^)} is bounded; moreover, if ■u;°° is any accumulation point of this w-sequence, then using the complementary cone argument, as done in the proof of part (b), we deduce the existence of a solution z°° € SOL{q°° , M) such that w°° = q°° + Mz°°. This is enough to show by part (a) that the sequence {w{q'^)} has a unique accumulation point which is equal to w{q°^). Therefore the continuity of the map w{q) at every vector q G TZ{M) follows. Q.E.D. Our goal is to apply the above theorem to the matrix A -A -A A [ n -n For this purpose, we derive a corollary of Theorem 1 pertaining to a symmetric positive definite matrix of the above form. Corollary 1 Let M = A^EA, where E is a symmetric positive semidef- inite m X m matrix and A is an arbitrary m x n matrix. 130 (a) For every q G TZ{M), if and z‘^ are any two solutions of the LCP [q^M)^ then EAz^ = EAz‘^. Let w{q) denote the common vector EAz for any solution 2 : of the LCP (q^M). (b) There exists a constant c' > 0 such that \\Mq)\\ < c'Wql Vg G 7^(M). (c) The function w : TZ{M) 3?^ is continuous. Proof. We note that for any nonzero symmetric positive semidefinite Af , we have where is the smallest positive eigenvalue of M and Xma,x{^) is the largest eigenvalue of M. With M = A^EA^ it follows that Mz 0 <=> EAz = 0. Hence, for every q G TZ{M)^ EAz is a constant for all solutions of the LCP (q^M). Moreover, there exists a scalar c > 0 such that for every qen{M), II II ^ ( 1 + c) II g II, for every solution 2 : of the LCP (g, M). Since -A- — rllMzf > z^Mz = (Az)'^EAz > - — ^-—llEAzf for all z G 3?’^, part (b) of the corollary follows readily. The proof of part (c) is very similar to that of the same part in Theorem 1. Q.E.D. It can be shown, using the theory of piecewise affine functions, that both functions w{q) and w{q) are Lipschitz continuous on TZ{M). Since this Lipschitz continuity property is not needed in the subsequent anal- ysis, we omit the details. 5.1 An existence result for a special VI In what follows, we establish an existence result for a linearly con- strained variational inequality (VI) of a special kind. This result will subsequently be applied to Models I and II of power market equilibria. The setup of the result is a VI (AC, F), where K is the Cartesian product Oligopolistic Market Equilibria in Linearized DC Power Networks 131 of two polyhedra K\ C and K 2 C with K 2 being compact. The mapping F is of the form: for (x^y) G F{x,y) q \ Mil Mi 2 r j M 21 M 22 where h : is a continuous function and the matrix M = Mil Mi 2 M21 M22 (13) is positive semidefinite (not necessarily symmetric). In the following result, an AVI is a VI defined by an affine pair {K, F), i.e., K is a, poly- hedron and F is an affine map. (We refer the reader to the monograph [6] for a comprehensive treatment of the finite-dimensional variational inequalities and complementarity problems.) Proposition 1 In addition to the above setting, assume that for all y e K 2 , the AVI (AT, F^) has a solution, where FHx,y) q r + h{y) + Mil Afi2 M21 M22 X y {x,y) e The VI (AT, F) has a solution. Proof. We apply Kakutani’s fixed-point theorem to the set- valued map- ping r : AT 2 ^ A '2 defined as follows. For each y G AT 2 , F(y) consists of all vectors y ^ K 2 for which there exists a vector x £ Ki such that the pair (x,y) solves the VI {K^Fy). Clearly, F(y) is a nonempty subset of K 2 \ r(y) is convex because if y^ and y^ are any two elements in F(y) and x^ and x‘^ are such that (x^y^) G SOL(A^, F^) for i = 1,2, then r(x\y^) + (1 — r)(x^,y^) remains a solution of the VI (AT, F^) for all scalars r G (0, 1), by the positive semidefiniteness of the matrix M. We next verify that F is a closed map. For this purpose, let {y^} be a se- quence of vectors in K 2 converging to a vector y^ in K 2 and for each k let (x^, y^) be a solution of the VI (iF, F^^) such that the sequence {y^} converges to a vector y^. We need to show the existence of a vector x^ such that the pair (x°^,y^) solves the VI {K^Fy°^). Write Ki = {x e 3?^' : Ax < b} and K 2 = {y e : Cy < d}. 132 For each k, there exist multipliers and such that / q V [ ^11 ^12 1 / ^ V \ ^ p V ^ + Kf) M22 J V / / ^ V c- V ) ~ 0 < ± Ax’^ -h 0 < 7 /*^ X Cy^ — d < 0. Again by a standard complementary cone argument, we can deduce the existence of 77 °°, and ^ which are not necessarily the limits of the sequences {q^} and {rr^}, respectively, such that / q \ r Mu Mi 2 1 / \ ^ Iv r + h(y°°) J ^ [ M21 M22 J V y°° J ^ V } ~ 0 < / 7 °° X Ax°° -b < 0 0 < r?°° X Cy°° -d < 0. This establishes that F is a closed map. In particular, it follows that F(y) is a closed subset of K 2 for all y m K 2 - Thus F satisfies all the assumptions required for the applicability of Kakutani’s fixed-point the- orem. By this theorem, F has a fixed point, which can easily be seen to be a solution of the VI {K, F). Q.E.D. 6. Properties of Models I and II Returning to the mixed NCPs (10) and (12), we consider the following LCP in the variables A^, parameterized by S and g: 0 < A- X g- + A( A- - A+ ) + ^ 5 / >0 V y (14) 0 < A+ X 9 + + A(A+-A-)-n^ ( p5- ^ 5 / 1 >0. We want to derive a sufficient condition under which the above LCP will have a solution for all “feasible” sales and generations. Specifically, let y ^ Ei i^f^9f) e : fer ( ^fi ~ 9fi ) ~ 0 ) 9fi ^ CAPyj, y ( f ,i) S JFxA/">, ieAf } Oligopolistic Market Equilibria in Linearized DC Power Networks 133 be the set of such sales and generations. The set T is a compact poly- hedron in We have for all pairs (s,p) € Y and every j G A/", Pj ^ ~ ^ ~ Pj ) 9fi- feP feN ieN Thus jeN jeN = -E E fePieN jeN = -E E EPD^jiCi, fePieN \jeN P? 0 9p- Therefore, the LCP (14) can be written as; 0 < A- A+ ± -T+ -f -n^ s [ n -n ; -n^ A- A+ > 0 , (15) where D is the |A/*| x |A/*| diagonal matrix with diagonal entries Pf /Q^. Proposition 2 If there exists a vector A G satisfying -T~ < n^SP^ + n^HHA < T+, (16) then the LCP (14) has a solution A^ for every pair {s^g) G Y , Proof. The LCP (15) is of the form: 0 < 2 : ± q + A^Er + A^EAz > 0, where E is a, symmetric positive semidefinite matrix. It follows from LCP theory that if there exists a vector z satisfying q + A^Ez > 0, then the LCP {q + A^Er^ M), where M = A^EA^ has a solution for all vectors r. Q.E.D. Throughout the following discussion, we assume that condition (16) holds. Thus the LCP (15) has a solution for all vectors g. Moreover, 134 specializing Corollary 1 to the matrix we deduce that for any vector g, if (A“’\ A+’^) for i = 1, 2 are any two solutions of (15), we have sn(A+’^ - A“’i) = sn(A+’2 - a"’2). Furthermore, if ^{g) denotes this common vector, then $ is a Lipschitz continuous function from into In terms of this function, we have TTi = a - uj S + ^i{g), Mi e Af, (17) which shows that tt^ is a function of the total sales S and the generations g. Since each $ is continuous and Y is compact, it follows that for each i G A7, the scalar <^i = min{(j - a;5 + $i(g) : (s,g) e Y} is finite. Therefore, if the intercepts a{ satisfy ai < Q, Mi G A7, then the denominators in and are positive for all (s,g) € Y and all f ^ T. Notice that as a result of (17), we can replace the dependence on in the two functions h\ and h\i by the dependence on g instead. The computation of each scalar Q requires the solution of an mathematical program with equilibrium constraints [10] that has a linear objective function and a parametric, monotone LCP constraint. 6.1 Existence of solutions Both NCPs (10) and (12) are equivalent to a VI of the type consid- ered in Proposition 1. More specifically, define the following principal submatrix of Moii by removing the last row and column: oli — M\S M\g -{Mxs)'^ Ms 0 -{MxgV 0 0 Oligopolistic Market Equilibria in Linearized DC Power Networks 135 the matrix Mq\[ is of order {2\A\ +2|A/*| x \T\). Define a reduced vector Qoli accordingly: Let ni = 2\A\ and ri 2 = 2|A/’| x \J^\. Identify the vectors x and y with and (5,g), respectively, the constant vector (g, r) with the matrix M with Moii and the function h{y) with either h{s,g) \ / hn{s,g) \ 0 ; ' I 0 A Furthermore, let K\ be the nonnegative orthant of and be the set Y: i ^2 = n { (^/>5/) e ■■ /e^ 9fi ) = 0, Qfi < CAP fi, V (/, i) € r xAf ieAf Under the above identifications, models I and II can therefore be formu- lated as the VI {Ki x K 2 ^F), where F is given by (13). We can readily apply Proposition 1 to establish the following existence result for the two models. Theorem 2 Suppose that there exists a vector A G satisfying (16). If ai < min {a - cu S + ^i{g) : ( 5, g ) G V }, Vi G A/", (18) then solutions exist to Models I and II. Proof. Under the assumption on the intercepts the function h{s^g) is well defined on the set K 2 . For every y = (s^g) G K 2 ^ the VI (i^, F^) is equivalent to the mixed LCP in the variable (a?, (p)\ 0 < aj _L + Moiiic + iV(^ + (0,0, h(s,g))^ > 0 0 = N^x. This mixed LCP is clearly feasible and monotone. It therefore has a solution. The existence of solutions to Models I and II follows readily from Proposition 1. Q.E.D. The next result identifies two important properties of the solutions to Model II. It shows in particular that Model I can be solved by solving Model II. 136 Theorem 3 Under the assumptions of Theorem 2, if (A^,5,g,y?) is a solution to Model II, then V i j and V / G ^ Therefore every solution to Model II is a solution to Model I. Proof. It suffices to show that for all i j and all / G .F, This is because by reversing the role of i and we obtain the reverse inequality and equality therefore must hold. The above inequality is clearly valid if Sfj = 0. So we may assume that Sfj > 0. By comple- mentarity, we have 9l + '^-/j Pj = cr - ujS - iff keA XI Pj'PDFj/fc j'eAf i4->^k) < 91 p? Sfi + ^-fi Oii This establishes the first assertion of the theorem. To prove the second assertion, we note that by what has just been proved, it follows that if (A^, 5,gf, cp) is a solution to Model II, then we must have Sfi '91^ pp 5_ fi TTi CXi jeAf / jeAf 91 + S- fj TTj aj This shows that (A"^, s^g) is also a solution to Model I. Q.E.D. 6.2 Uniqueness in Model II In this subsection, we show that if each price intercept ai is suitably restricted, then the firms’ sales in the market model II are unique. The cornerstone to this uniqueness property of the model solutions is the Oligopolistic Market Equilibria in Linearized DC Power Networks 137 expression (9). Based on this expression, we show that the mapping Fn{x,^) = f j + Moii N -N 0 X ‘P + ( 0 \ 0 /mi(s,a±) 0 V 0 J is monotone. Throughout the following analysis, we restrict the pair (5, A^) so that a-uS+Y. ^(<5.._p.)PDF,-fc(A+-A^) > V* e N. jeJ\f keA To establish the desired monotonicity of f ip we first compute the Jaco- bian matrix of the function hn(s, A^). We begin by noting the following partial derivatives: diTj dsfi> — -cj, V / G G A7 and rinr- T = ± 7® E fePDFjt ViiM,kiA. Next, recalling that huji{s,X^) = Sfi Qj I ^-fi PP TTi - ai we have for all f E. P, 1 dhuji ^ I dsfi' Qj^ _| ^ -fi ^fi 0 . - 91 , ^zZi. TTi - ai [ po + {%i - aiY TV j Oij ^fi U fi Q! + S-fi (tTz - ai) .\2 if i = i' \ii ^ i' Pi 7Ti - ai 138 and for all f f'^ dhuji ds j'i' = < Q S- fi ^i- ai ^ fi Ql + -fi 2 \TTi- tti ( TTj - aj )2 {■Ki - a ,)2 if i = i' Hi ^ i'. Pf TTi-ai J Moreover, for all / G i G A/" and k E ^ S E C«pdf,* 9>'t Qf s^fi yi^i-o^i^Q^i Pf TTi- ai jeAf Therefore, the Jacobian matrix of /in ( 5 , A^) has the following partitioned form: A{, ^1\T\ jdX dA+ as ... 45 JD\n~\ -D\'Tr\ -^ l^li ^ VI 1*^1 l-^ll mi^l where each Ajj, is an \Af \ x \Af \ matrix with entries ( A^ff , ) ... = Vi.i'eAT ds fi' and each is an |A/*| x \A\ matrix with entries = yieAf,keA. dXi Consequently, the Jacobian matrix of Fu{x^(p) can be written as the sum of two matrices L\ and L 2 , where Li - A -A 0 0 0 0 -A A 0 0 0 0 Br 0 0 ^\T\i 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Oligopolistic Market Equilibria in Linearized DC Power Networks 139 and L2 = 0 Mxs Mxg 0 0 -{Mxs)'^ Ms 0 0 J -{Mxg)^ 0 0 / -J 0 0-100 0 -J J 0 0 The matrix L 2 is skew-symmetric, thus positive semidefinite. To show that L\ is also positive semidefinite, recall that A == n^ETI. Further- more, we have V/ G for some |A/"| x \Af \ diagonal matrix Df with ( Df )ii — ^fi 91 pP + -fi TTj — aq S - fi pP -A n {■Ki-aif Qf Vi e V. Let r Ml ■ AS -] M\t\ ■ Di ■ A = 1 J and D = . . Notice that each block is a function of s and tt; so is each matrix Df. Consider the matrix Li [ n n ] 0 DH [ n n ] A Clearly, Li is positive semidefinite if and only if Li is so. The next lemma shows that the latter matrix is positive semidefinite. Lemma 1 For every compact set C there exists a such that if ai < a, Vi G V, the matrix Li is positive semidefinite for all (s, tt) G fi. 140 Proof. The symmetric part of the matrix L\ is equal to - yiT ■ s [ n -n ] i ■ ■ 1 [I] -n^ . L J 2 . -n^ . ^DE [n -n] A which we can write as the sum of two matrices: - - 1 o o 0 1 -n^ e[u -n + 0 0 0 - 5^ - _ 0 0 A-\DED^ _ The first summand is clearly positive semidefinite. Provided that the pair (5,7 t) is bounded, the matrix A — is positive definite for all ai with \ai\ sufficiently large. Q.E.D. Each pair [s^g) in the compact set Y induces a price vector tt via the expression (9), where is a solution of the LCP (14). The induced prices are bounded by the continuity of the function $ and the bounded- ness of Y] cf. (17). Let be a compact convex subset of containing all such pairs (5,7 t). Corresponding to this set we may choose a such that the Jacobian matrix of is positive semidef- inite for all pairs {x^ip) belonging to a convex set that contains all so- lutions of Model II. Thus Fn is monotone on this set. Based on this monotonicity property, we can establish the desired uniqueness of the sales and other variables in Model II. Theorem 4 Under the assumptions of Theorem 2, there exists a such that if ai < yi E A/*, the following variables are unique in the solutions of Model II: (a) the sales s fi for all / G F and i G A/*; (b) the prices tt^ for all i G A/*; (c) the total generations for all / G F; and (d) the profits for each firm. Proof. Let and {x‘^^pP‘) be two solutions of Model II. Let tt^ and 7T^ be the induced prices. By the monotonicity of Fn, it follows that ^ ( *2 ) ( ) = 0 - Oligopolistic Market Equilibria in Linearized DC Power Networks 141 Let - A“’* and i = 1,2. We have (Ai-A2)^n^Sn(Ai-A2)+o;(Si-52)2+ ( 5^ — 5^ ) ^( hi{s^^ A=^’^) — A^’^) ) = 0. By the mean- value theorem, it follows that for some triple (5,7 t) on the line segment joining (5^,7 t^) and (s^,7t^), A+’^ - A+’2 \ T - • H [ n n ] 0 / A+’i - A+>2 \ A->i - A->2 -n^ A--1 - A-’2 J DE [ n n ] A _ 1 ; cu{S^ -S^)^ - 0, where the matrices D and A are evaluated at (s,^). By the proof of Lemma 1, it follows that = 5^ and 5BA^ = SBA^ This yields Since E w = E */i. V/ € F, ieAf ieN it follows that each firm’s total generation is unique. Finally, to show that the profit for each firm is unique, note that the profit of firm / is equal to Pf{s, 7T, w) J2 ^fi- ~'^i)9fi = XI ( TTj - Cfi ) Qfi, ieAf ieM ieN because tt^ = pf{s^7r^w) + wi (by (6)) and the sum of sji over all i in M is equal to the sum of gfi over all i in J\f. Let (5 j, (p) be an arbitrary solution of Model II. Consider the linear program in the variable gf = {gfi -.ieX): maximize E ( TTf Cfi ) gfi ieAf subject to 0 < gfi < CAP fi^ e Af and 9fi — ^fi- ieN ieN (19) Since for i e M and Yli^j\f s fi are constants of Model II, it follows that the above linear program depends only on the firm / and does not 142 depend on the pair (®, cp) of solution to Model II. The optimal objective value of the linear program gives the profit of firm /. Q.E.D. Acknowledgments. The work of this research was partially supported by the National Science Foundation under grant ECS-0080577. References [1] R. Baldick, R. Grant, and E. Kahn, “Linear Supply Function Equilibrium: Generalizations, Application, and Limitations,” PWP-078, University of Cali- fornia Energy Institute, Berkeley, CA (August 2000). [2] C.A. Berry, B.F. Hobbs, W.A. Meroney, R.P. O’Neill, and W.R. Stewart, Jr., “Analyzing Strategic Bidding Behavior in Transmission Networks,” Utilities Policy 8 (1999) 139-158. [3] R.W. Cottle, J.S. Pang, and R.E. Stone, The Linear Complementarity Prob- lem, Academic Press (Boston 1992). [4] C. Day, B.F. Hobbs, and J.S. Pang, “Oligopolistic Competition in Power Net- works: A Conjectured Supply Function Approach,” IEEE Transactions on Power Systems 17 (2002) 97-107. [5] R. Gilbert and E. Kahn, editors. International Comparisons of Electricity Reg- ulation, Cambridge University Press (New York 1996). [6] F. Facchinei and J.S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems, Springer- Verlag (New York 2003). [7] B.F. Hobbs, “Linear Complementarity Models of Nash-Cournot Competition in Bilateral and POOLCO Power Markets,” IEEE Transactions on Power Sys- tems 16 (2001) 194-202. [8] E. Kahn, “Numerical Techniques for Analyzing Market Power in Electricity,” Electricity Journal 11 (July 1998) 34-43. [91 P.D. Klemperer and M.A. Meyer, “Supply Function Equilibria,” Econometrica 57 (1989) 1243-1277. [10] Z.Q. Luo, J.S. Pang, and D. Ralph, Mathematical Programs with Equilibrium Constraints, Cambridge University Press (Cambridge 1996). [11] C. Metzler, B.F. Hobbs, and J.S. Pang, “Nash- Cournot Equilibria in Power Markets on a Linearized DC Network with Arbitrage: Formulations and Prop- erties”, Network and Spatial Economics 3 (2003) 123-150. [12] R.E. Schuler, “Analytic and Experimentally Derived Estimates of Market Power in Deregulated Electricity Systems: Policy Implications for the Manage- ment and Institutional Evolution of the Industry,” Decision Support Systems 30 (January 2001) 341-355. [13] F.C. Schweppe, M.C. Caramanis, R.E. Tabors, and R.E. Bohn, Spot Pricing of Electricity, Kluwer Academic Publishers, (Norwell 1988). [14] Y. Smeers and J.Y. Wei, “Spatially Oligopolistic Model with Opportunity Cost Pricing for Transmission Capacity Reservations-A Variational Inequality Approach,” CORE Discussion Paper 9717, Universite Catholique de Louvain (February 1997). RISK CONTROL AND OPTIMIZATION FOR STRUCTURAL FACILITIES Riidiger Rackwitz Technische Universitdt Munchen, Munich, Germany Arcisstr. 21, D- 80290 Munchen rackwitz@mb.bv.tum.de Abstract Optimization techniques are essential ingredients of reliability-oriented optimal designs of technical facilities. Although many technical aspects are not yet solved and the available spectrum of models and methods in structural reliability is still limited many practical problems can be solved. A special one-level optimization is proposed for general cost- benefit analysis and some technical aspects are discussed. However, focus is on some more critical issues, for example, ’’what is a reason- able replacement strategy for structural facilities?”, ”how safe is safe enough?” and ”how to discount losses of material, opportunity and hu- man lives?” . An attempt has been made to give at least partial answers. Keywords: Structural reliabiltiy, optimization, risk acceptability, discount rates 1. Introduction The theory of structural reliability has been developed to fair matu- rity within the last 30 years. The inverse problem, i.e. how to determine certain parameters in the function describing the boundary between safe and failure states for given reliability, has been addressed only recently. It is a typical optimization problem. Designing, erecting and maintaining structural facilities may be viewed as a decision problem where maxi- mum benefit and least cost are sought and the reliability requirements are fulfilled simultaneously. In what follows the basic formulations of the various aspects of the decision problem are outlined making use of some more recent results in the engineering literature. The structure of a suit- able objective function is first discussed. A renewal model proposed as early as 1971 by Rosenblueth/Mendoza [42], further developed in [17], [40] and extended in [36], [18] is presented in some detail. Theory and methods of structural reliability are reviewed next where it is pointed 143 144 out that the calculation of suitable reliability measures is essentially an optimization problem. Focus is on the concepts of modern first- and second reliability methods [20]. The problem of the value of human life is then discussed in the context of modern health-related economic theo- ries. Some remarks are made about appropriate discount rates. Finally, details of a special version of modern reliability-oriented optimization techniques based on work in [26] are outlined followed by an illustrative example. 2. Optimal Structures A structure is optimal if the following objective is maximized: Z(p) = B{p) - C(p) - D{p) (1) Without loss of generality it is assumed that all quantities in eq. (1) can be measured in monetary units. 5(p) is the benefit derived from the existence of the structure, C(p) is the cost of design and construction and -D(p) is the cost in case of failure, p is the vector of all safety relevant parameters. Statistical decision theory dictates that expected values are to be taken. In the following it is assumed that S(p),C(p) and D{p) are differentiable in each component of p. The cost may differ for the different parties involved, e.g. the owner, the builder, the user and society. A structural facility makes sense only if Z{p) is positive for all parties involved within certain parameter ranges. The intersection of these ranges defines reasonable structures. The structure which eventually will fail after a long time will have to be optimized at the decision point, i.e. at time t == 0. Therefore, all cost need to be discounted. We assume a continuous discounting function 6{t) — exp [—"ft] which is accurate enough for all practical purposes and where 7 is the interest rate. It is useful to distinguish between two replacement strategies, one where the facility is given up after failure and one where the facil- ity is systematically replaced after failure. Further we distinguish between structures which fail upon completion or never and struc- tures which fail at a random point in time much later due to service loads, extreme external disturbances or deterioration. The first option implies that loads on the structure are time invariant. Reconstruction times are assumed to be negligibly short. At first sight there is no particular preference for either of the replacement strategies. For infras- tructure facilities the second strategy is a natural strategy. Structures only used once, e.g. special auxiliary construction structures or boosters for space transport vehicles fall into the first category. Risk Control and Optimization for Structural Facilities 145 3. The Renewal Model 3.1 Failure upon completion due to time-invariant loads The objective function for a structure given up after failure at completion due to time-invariant loads (essentially dead weight) is Zip) = B* Rf ip) - Cip) - HPf ip) C(p) - {B* + H)Pf ip) (2) Rf{p) is the reliability and Pf{p) = 1 — Rf{p) the failure probability, respectively. H is the direct cost of failure including demolition and debris removal cost. For failure at completion and systematic re- construction we have Z{p) =B*~ C{p) - (C(p) + H)Y,iPfipyRfip) i=l = B'-C(p)-(C(p) + /f)j^^ (3) After failure one, of course, investigates its causes and redesigns the structure. However, we will assume that the first design was already optimal so that there is no reason to change the design rules leading to the same f/(p). If each structural realization is independent of each other formula (3) holds. A certain ambiguity exists when assessing the benefit B* taken here and in the following as independent of p. If the intended time of use of the facility is tg it is simply = B{ts) = r b{t)6{t)dt (4) Jo For constant benefit per time unit b{t) = b one determines B* = B(ts) = - [1 - exp [-7 ^b]] - (5) 'y Cs->oo "y 3,2 Random Failure in Time Assume now random failure events in time. The time to the first event has distribution function Fi (t, p) with probability density fi (t, p) . If the structure is given up after failure it is obviously B{tg)= f b{t)S{t)Ri{t,p)dt Jo ( 6 ) 146 D{ts) = r h{t,p)d{t)Hdt (7) Jo and therefore Zip) = r bimt)R,{t,p)dt - C(p) - r 6{t)hit,p)Hdt (8) Jo Jo For tg oo and /i(7, p) — e~'^^fi{t^p)dt the Laplace transform of /i(t, p) it is instead Zip) = ^ [1 - /i*(7, p)] - C(p) - (7, p) (9) For the more important case of systematic reconstruction we gen- eralize our model slightly. Assume that the time to first failure has density fi{t) while all other times between failure are independent of each other and have density /(t), i.e. failures and subsequent renewals follow a modified renewal process [11]. This makes sense because extreme loading events usually are not controllable, i.e. the time origin lies some- where between the zeroth and first event. The independence assumption is more critical. It implies that the structures are realized with indepen- dent resistances at each renewal according to the same design rules and the loads on the structures are independent, at least asymptotically. For constant benefit per time unit b{t) = b we now derive by making use of the convolution theorem for Laplace transforms roo QQ roo Zip) = / be^'^^dt - C(p) - (C(p) + -ff) y] / e~'^^fnit,p)dt n=l = ^-C(p)-(C(p)+«)j^ = ^-C(p)-(0(p) + JJ)/.:(7,p) (10) where /ii(7, p) is the Laplace transform of the renewal intensity h\(t^ p). For regular renewal processes one replaces /i(7, p) by /*(7, p). For the renewal intensity and its Laplace transform there is an important asymp- totic result [11]: lim /i(i,p) = lim7/i*(7,p) = t->oo 7^0 1 m(p) ( 11 ) where m(p) is the mean of the renewal times. Risk Control and Optimization for Structural Facilities 147 If, in particular, the events follow a stationary Poisson process with intensity A we have POO \ /i ( 7 ) = /*(t) = / exp [- 7 ^] Aexp [-Xt] dt = —— (12) Ja 7 + and fc*(7) = ^ (13) This result is of great importance because structural failures should, in fact, be rare, independent events. Then, the Poisson intensity A can be replaced by the so-called outcrossing rate to be described below - even in the locally non-stationary case. Finally, if at an extreme load- ing event (e.g. flood, wind storm, earthquake, explosion) failure occurs with probability P/(p) and f\{t) and /(t), respectively, denote the den- sities of the times between the loading events one obtains by similar considerations 9i-(7,P) = = 1 For the case treated in eq. (13) we have for stationary Poissonian load occurrences: h*il,p) = fft(7.P) ^/(P)A 1-5*(7,P) 7 (15) Unfortunately, Laplace transforms are rarely analytic. Taking Laplace transforms numerically requires some effort but taking the inverse Laplace transform must simply be considered as an numerically ill-posed prob- lem. Then, however, one always can resort to the asymptotic result which can be shown to be accurate enough for all practical purposes. The foregoing results can be generalized to cover multiple mode fail- ure, loss of serviceability, obsolescence of the facility and inspection and maintenance. Also, the case of non-constant benefit, a case of obsoles- cence, or non-constant damage has been addressed. Further develop- ments are under way. 4. Computation of Failure Probabilities and Failure Rates 4.1 Time-invariant Reliabilities The simplest problem of computing failure probabilities is given as a volume integral 148 where the failure event is F{p) — {/i(x, p) < 0} and the random vec- tor X = (Xi, X2 , has joint distribution function jPx(x). Since n usually is large and P/(p) is small serious numerical difficulties oc- cur if standard methods of numerical integration are applied. However, if it is assumed that the density /x(x, p) of Fx(x) exists everywhere and /i(x,p) is twice differentiable, then, the problem of computing fail- ure probabilities can be converted into a problem of optimization and some simple algebra. For convenience, a probability preserving dis- tribution transformation U — T“^(X) is first applied [19]. Making use of Laplace integration methods [4] one can then show that with /i(x,p) =h{T{u),p) =g{u,p) [5], [20] Pf{p)= / /x(x,p)c/x= / (pxj{u,p)du Jh{x, p)<0 Jg(u,p)<0 n—1 -$(-/?) (17) i=l for 1 < (X) with 13 = ||u*|| = min{u} for {u : g(u,p) < 0} , (18) (^u(u) the multinormal density, $(.) the one-dimensional normal in- tegral, 5(0, p) > 0 and i^i the main curvatures of the failure surface dF = {^(u, p) = 0} . Of course, it is assumed that a unique ’’critical” point u* exists but methods have been devised to also locate and con- sider appropriately multiple critical points. In line two the asymptotic result is given denoted by second order since the Hessian of p(u, p) = 0 is involved. The last result represents a first-order result correspond- ing to a linearization of ^(u, p) in u* already pointed out by [16]. Very frequently this is sufficiently accurate in practical applications. 4.2 Time- variant Reliabilities Much more difficult is the computation of time- variant reliabilities. Here, the question is not that the system is in an adverse state at some arbitrary point in time but that it enters it for the first time given that it was initially at time t = 0 in a safe state. The problem is denoted by first passage problem in the engineering literature. But exact results for distributions of first passage times are almost inexistent. However, good approximations can be obtained by the so-called outcrossing approach [13]. The outcrossing rate is defined by ^^(t) = lim r + A) = 1) a->oA ^ ^ ^ ^ (19) 149 Risk Control and Optimization for Structural Facilities or for the original vector process u+{r) = |im ipi({X(r) € F} D {X(r + A) € P}) (20) One easily sees that the definition of the outcrossing rate coincides for- mally with the definition of the renewal intensity. The counting process N{.) of outcrossings must be a regular process [12] so that the mean value of outcrossings in [0, t] is given by E[N{t)] = [ i/+(r)dT (21) Jo One can derive an important upper bound. Failure occurs either if X(0) € F or N{t) > 0. Therefore [28] Pf{t) = 1 - P(X(r) e P) for all r e [0, t] = P({X(0) 6 P} U {N{t) > 0}) = P(X(0) G P) + P(N{t) > 0) - P({X(0) G P} n {N{t) > 0}) < P(X(0) G P) -t- P{N{t) > 0) < P(X(0) eV) + P[iV(t)] (22) If the original process is sufficiently mixing one can derive the asymptotic result [13]: P/(t)~l-exp[-P[iV(t)]] (23) justifying the remarks below eq. (13). A lower bound can also be given. It is less useful. Consider a stationary vectorial rectangular wave renewal process each component having renewal rate Xi and amplitude distribution function Fi{x). The amplitudes Xi are independent. Regularity assures that only one component has a renewal in a small time interval with probability Ai A. Then [9] n i/+(P)A = P(|J{renewal in [0, A]} n {X^ G P} n {X+ G P}) 2=1 n = 5]AA,P({XiGP}n{X+GP}) 2=1 n = Ai[P(X+ G P) - P({Xi G P)} n {X+ G P})] (24) 2=1 denotes the process X before and X^the process after a jump of the th component. If the components are standard normally dis- tributed and the failure domain is a half-space F = {a^u + /? < 0} one 150 Figure 1. Outcrossings of a vectorial rectangular wave renewal process determines = E A, [P n {a'^Vf < -/3})] i=l n n = E Ai [$2(i0, -iS; Pi)] < E (25) i=l i=l where pi = 1 — of is the correlation coefficient of the process before and after a jump and $ 2 (-, ■; •) the bivariate normal integral. For general non-linear failure surfaces one can show that asymptotically [8] n n—1 u+{F) = E Ai^(-iS) 11(1 - ^ Ki)-'/2; i< /3 ^ oo (26) i=l i=l with /3 = ||u*|| — min{||u||} for ^(u) < 0 and Ki the main curvatures in the solution point u*. This corresponds to the result in eq. (17). The same optimization problem as in the time-invariant case has to be solved. Rectangular wave renewal processes are used to model life loads, sea states, traffic loads, etc.. For stationary vector processes with differentiable sample paths it is useful to standardize the original process X(t) and its derivative (in Risk Control and Optimization for Structural Facilities 151 mean square) process X(i) = ^X(i) such that £J[U(t)] = E |^U(t) = 0,R(0) = I where R(r) = E j^U(0)U(r)^j is the matrix of correlation functions and t = \t\ — t 2 \. A matrix of cross correlation functions be- tween U(t) and U(t), R(t) = E ^U(0)U(r)^ , as well as of the deriva- tive process R(r) = E ^U(0)U(r)^j also exists. The general outcrossing rate is defined by [38], [3] v+{t) = lim At — )-0 P({U(t)G A(aP(t))}n {f/ivW > 5P(t)} in [r < t < T -F Arj) At (27) where = n^(u,t)U(t) the projection of U(t) on the normal n(u,t) = — Qf(u,t) of dF{t) in (u,t). A{dF(t)) is a thin layer around dF{t) with thickness ('Uw(^) ~ dF{t))Ar. Hence, it is: P({U(t) € A{dF{t))} Pi |f/Ar(t) > ap(t)| in [r < t < T -f Arj) (Pn+l{u,UN ,t)dudUN -I / JA{dF{t)) Ju / JdF{t) Ju In the stationary case one finds with dF = g'(u) = 0 'A(dF{t)) JUN(t)>dF{t) {iiN - dF{t))(pn+i{u,UM,t)ds{u)duN (28) dF{t) JUN{t)>dF(t) r POO -{dF)= / UN(Pn+l{u,UM)duNds{u) JdF Jo ^ = u)(^y^(u)duA7^^«5(u) JdF Jo = J^^EfT [Piv|U = u] <Pniu)ds{u) = [ P~[Piv|U = u (pn-i{u,p{u))T{u)du (29) where Un = p(u) = g~^{ui^U 2 ^ ...•,Un-i) a parameterization of the sur- face and T(u) the corresponding transformation determinant. Explicit results are available only for special forms of the failure sur- face. For example, if it is a hyperplane 152 Figure 2. Outcrossing of a vectorial differentiable process 9F = < ^ OLiUi + ^ = 0 (30) i=l the outcrossing rate of a stationary standardized Gaussian process is [51]: v^{dF) = E Un fm = (31) with = a^R(r)a.. An asymptotic result for general non-linear sur- faces has been derived in [7]: iy+{dF) = E[(^ “ uj^ = n(u*)^ [r( 0) +R(0)^G(u*)R(0)| n(u*) n— 1 - 1/2 with (32) provided that ^(0) > 0 and with R(0) = E U(0)U(0)^ G(u-) = |v9(uT‘|^;U = l,....n} and Risk Control and Optimization for Structural Facilities 153 Here again we have /? == ||u*|| = min{||u||} for ^(u) < 0 and Ki are the main curvatures of dF in the solution point u*. Differentiable pro- cesses are used to model the turbulent natural wind, wind waves and earthquake excitations but also the output of dynamical systems. Exact or approximate results have also been obtained for non-gaussian rectangular wave processes with or without correlated components [34], certain non-gaussian differentiable processes [14] and a variety of non- stationarities of the processes or the failure surfaces [35]. If one is not satisfied with the (asymptotic) approximations one can apply impor- tance sampling methods in order to arrive at an arbitrarily exact result. Due to regularity of the crossings one can combine rectangular wave and differentiable processes. The processes can be intermittent [46], [22]. This allows the modelling of disturbances of short to very short duration (earthquakes, explosions). Such models have also been extended to deal with occurrence clustering [55], [45]. It is remarkable that the ’’critical” point u*, i.e. the magnitude of plays an important role in all cases as in the time-invariant case. It must be found by a suitable algorithm. Sequential quadratic programming algorithms tuned to the special problem of interest turned out to solve the optimization problem reliably and efficiently in practical applications [!]• However, it must be mentioned that in time-variant reliability more general models, e.g. renewal models with non-rectangular wave shapes, filtered Poisson process models, etc. can be easily formulated but hardly made practical from a computational point of view. 5. The Value of Human Life and Limb in the Public Interest Two questions remain: a. Is it admissible to optimize benefits and cost if human lives are endangered and b. can we discount the ’’cost of human lives” ? First of all, modern approaches to these questions do not speak of a monetary value of the human life but rather speak of the cost to save lives.. Secondly, any further argumentation must be within the framework of our moral and ethical principles as laid down in our constitutions and elsewhere. We quote as an example a few articles from the BASIC LAW of the Federal Republic of Germany: ■ Article 2: (1) Everyone has the right to the free development of his personality ...(2) Everyone has the right to life and to inviolability of his person 154 ■ Article 3: (1) All persons are equal before the law. (2) Men and women have equal rights. (3) No one may be prejudiced or favored because of his sex, his parentage, his race, his language, his home- land and origin, his faith or his religious or political opinions. Similar principles are found in all modern, democratic constitutions. But H. D. Thoreau (1817-1862 p.Chr.) realistically says about the value of human life: ” The cost of a thing is the amount of what I will call life which is required to be exchanged for it, immediately or in the long run. ... [29], Can these value fixings be transferred to engineering acceptability cri- teria? This is possible when starting from certain social indicators such as life expectancy, gross national product (GNP), state of health care, etc.. Life expectancy e is the area under the survivor curve 5(a) as a function of age a, i.e. e = S{a)da. A suitable measure for the qual- ity of life is the GNP per capita, despite of some moral indignation at first sight. The GNP is created by labor and capital (stored labor). It provides the infrastructure of a country, its social structure, its cultural and educational offers, its ecological conditions among others but also the means for the individual enjoyment of life by consumption. Most im- portantly in our context, it creates the possibilities to ’’buy” additional life years through better medical care, improved safety in road traffic, more safety in or around building facilities or from hazardous technical activities, etc.. Safety of buildings via building codes is an investment into saving lives. The investments into structural safety must be effi- cient, however. Otherwise investments into other life saving activities are preferable. In all further considerations only about 60% of the GNP, i.e. g 0.6 GNP which is the part available for private use, are taken into account. Denote by c(r) > 0 the consumption rate at age r and by u{c{r)) the utility derived from consumption. Individuals tend to undervalue a prospect of future consumption as compared to that of present con- sumption. This is taken into account by some discounting. The life time utility for a person at age a until she/he attains age t > a then is U{a,t) = / a[c(r)]exp J a t u [c{r)] exp [-p{r - a)] dr (33) for p{9) = p. It is assumed that consumption is not delayed, i.e. incomes are not transformed into bequests, p should be conceptually distin- guished from a financial interest rate and is referred to as rate of time Risk Control and Optimization for Structural Facilities 155 preference of consumption. A rate p > 0 has been interpreted as the effect of human impatience, myopia, egoism, lack of telescopic faculty, etc.. Exponential population growth with rate n can be considered by replacing p hy p — n taking into account that families are by a factor exp[nt] larger at a later time t > 0. The correction p > n appears always necessary, simply because future generations are expected to be larger and wealthier, p is reported to be between 1 and 3% for health related investments, with tendency to lower values [53]. Empirical estimates reflecting pure consumption behavior vary considerably but are in part significantly larger [25]. The expected remaining present value life time utility at age a (con- ditional on having survived until a) then is (see [2] [43] [39] [15]) L{a) = E[U{a)]= C {a, t)dt Ja t\Cl) 2 rcLu = ^ y U [c(i)] exp [- {p a)] i{t)dt = n[c]ed(a,p,n) (34) where f{t)dt = ^/r(T)exp — p{T)dr j dt is the probability of dying between age t and t dt computed from life tables. The expression in the third line is obtained upon integration by parts. Also, a con- stant consumption rate c independent of t has been introduced which can be shown to be optimal under perfect market conditions [43]. The ’’discounted” life expectancy e^^(a,p, n) at age a can be computed from ed(a,p,n) = exp((p — n)a) i{a) rcLu ■ r / exp - / Ja L Jo {p(t) + (p- n))dT dt (35) ’’Discounting” affects ed{a^p^n) primarily when /i(r) is small (i.e. at young age) while it has little effect for larger p{r) at higher ages. It is important to recognize that ’’discounting” by p is initially with respect to u [c(r)] but is formally included in the life expectancy term. For u [c] we select a power function - 1 u[c] = ^- (36) with 0 < g < 1, implying constant relative risk aversion according to Arrow-Pratt. The form of eq. (36) reflects the reasonable assumption that marginal utility decays with consumption c. u [c] is 156 a concave function since > 0 for g > 0 and < 0 for g < 1. The numerical value has been chosen to be about 0.2 (see [43] [15] and elsewhere as well as table 2 below). It may also be derived from from the work-leisure optimization principle as outlined in [29] where q = and w the average fraction of e devoted to (paid) work (see [37] for estimates derived from this principle). This magnitude has also been verified empirically (see, for example, [25]). For simplicity, we also take 1 . Shepard/ Zeckhauser [43] now define the ’’value of a statistical life” at age a by converting eq. (34) into monetary units in dividing it by the marginal utility = u' " L ' W)“‘ = exp[-{p-n){t-a)]e{t)dt = ^ed{a,p,n) (37) because The ”willingness-to-pay” has been defined as WTP{a) = VSL{a) dm (38) In analogy to Pandey/Nathwani [31], and here we differ from the related economics literature, these quantities are averaged over the age distribu- tion /i(a, n) in a stable population in order to take proper account of the composition of the population exposed to hazards in and from technical objects. One obtains the ’’societal value of a statistical life” SVSL = (39) with rau ed{a, p^n)h{a^n)da (40) Jo and the ’’societal willingness-to-pay” as: SWTP = dm (41) For p = 0 the averaged ’’discounted” life expectancy ^ is a quantity which is about 60% of e and considerably less than that for larger p. In Risk Control and Optimization for Structural Facilities 157 this purely economic consideration it appears appropriate to define also the undiscounted average lost earnings in case of death, i.e. the so-called ’’human capital”: HC roLu Jo g{e — a)h{a^ n)da (42) Table 1 shows the SVSL for some selected countries as a function of p indicating the importance of a realistic assessment of p. Prance Germany Japan Russia USA e 78 78 80 66 77 n 0.37% 0.27% 0.17% -0.35 0.90% 9 14660 14460 15960 5440 22030 9 0.174 0.167 0.208 0.188 0.222 0% 4.05 3.96 3.46 0.93 5.83 1% 3.05 3.00 2.62 0.74 4.28 P 2% 2.38 2.36 2.06 0.61 3.27 3% 1.92 1.92 1.67 0.51 2.59 4% 1.59 1.59 1.39 0.54 2.11 Table 1: SVSL 10^ in PPP US$ for some countries for various p (from recent complete life tables provided by national statistical offices) It can reasonably be assumed that the life risk in and from technical facilities is uniformly distributed over the age and sex of those affected. Also, it is assumed that everybody uses such facilities and, therefore, is exposed to possible fatal accidents. The total cost of a safety related regulation per member of the group and year is SWTP = — dCy(p) = — ^ YJ!i=zi dCy.iip) where r is the total number of objects under discus- sion, each with incremental cost dCy^i and N is the group size. For simplicity, the design parameter is temporarily assumed to be a scalar. This gives: -dCy{p) + SV^ dm = 0 (43) Let dm be proportional to the mean failure rate dh{p)^ i.e. it is assumed that the process of failures and renewals is already in a stationary state that is for t — oc. Rearrangement yields = -kWSL ( 44 ) where dm = A;d/i(p), 0 < fc < 1 (45) 158 the proportionality constant k relating the changes in mortality to changes in the failure rate. Note that for any reasonable risk reducing interven- tion there is necessarily dh{p) < 0. The criterion eq. (44) is derived for safety-related regulations for a larger group in a society or the entire society. Can it also be applied to individual technical projects? SVSL as well as HC were related to one anonymous person. For a specific project it makes sense to apply criterion (44) to the whole group exposed. Therefore, the ’’life saving cost” of a technical project with Np potential fatalities is: Hf = HC kNp (46) The monetary losses in case of failure are decomposed into H = Hm+Hf in formulations of the type eq. (10) with Hm the losses not related to human life and limb. Criterion (44) changes accordingly into: = -'WSLkNp (47) dh[p) All quantities in eq. (47) are related to one year. For a particular technical project all design and construction cost, denoted by dC(p), must be raised at the decision point t = 0. The yearly cost must be replaced by the erection cost dC{p) at t = 0 on the left hand side of eq. (47) and discounting is necessary. The method of discounting is the same as for discharging an annuity. If the public is involved dCyip) may be interpreted as cost of societal financing of dC{p). The interest rate to be used must then be a societal interest rate to be discussed below. Otherwise the interest rate is the market rate, g in SVSL also grows approximately exponentially with rate the rate of economic growth in a country. It can be taken into account by discounting. The acceptability criterion for individual technical projects then is (discount factor for discounted erection cost moved to the right hand side): ^ ^ Cexppi dn[p) 7 exp [ 7 tJ exp — 1 -SVSLkNF- 7 (48) It must be mentioned that a similar very convincing consideration about the necessary effort to reduce the risk for human life from technical objects has been given by Nathwani et al. [29] and in [31] producing estimates for the equivalent of the constant SVSL very close to those given in table 1. The estimates for SVSL are in good agreement with Risk Control and Optimization for Structural Facilities 159 several other estimates in the literature (see, for example, [49], [43]; [52]; [24] and many others) which are between 1000000 and 10000000 PPP US$ with a clustering around 5000000 PPP US$. 6. Remarks about Interest Rates A cost-benefit optimization must use interest rates. Considering the time horizon of some 20 to more than 100 years for most structural facilities but also for many risky industrial installations it is clear that average interest rates net of in/deflation must be chosen. If the option with systematic reconstruction is chosen one immediately sees from eq. (14) that the interest rate must be non-zero. For the same equation we see that there is a maximum interest rate 7max which Z{p) becomes negative for any p Tmax — m(p)fe-(C(p)+g) "i'(p)C'(p) (49) and, therefore, 0 < 7 < 7max- Also m(p)6 > C(p) + H must be valid for any reasonable project which further implies that 6/7 > 1. Very small interest rates, on the other hand, cause benefit and damage cost to dominate over the erection cost. Then, in the limit {C{p)^H) m(p) (50) where the interest rate vanishes. Erection cost are normally weakly increasing in the components of p but m(p) grows significantly with p. Consequently, the optimum is reached for m(p) ^ 00, that is for perfect safety which is not attainable in practice. In other words the interest rate must be distinctly different from zero. Otherwise, the different parties involved in the project may use interest rates taken from the financial market at the decision point t == 0. The cost for saving life years also enters into the objective function and with it the question of discounting those cost also arises. At first sight this is not in agreement with our moral value system. However, a number of studies summarized in [32] and [23] express a rather clear opinion based on ethical and economical arguments. The cost for saving life years must be discounted at the same rate as other investments, especially in view of the fact that our present value system should be maintained also for future generations. Otherwise serious inconsistencies cannot be avoided. What should then the discount rate for public investments into life saving projects be? A first estimate could be based on the long term 160 growth rate of the GNP. In most developed, industrial countries this was a little more than 2% over the last 50 years. The United Nations Human Development Report 2000 gives values between 1.2 and 1.9 % for industrialized countries during 1975-1998. If one extends the consid- eration to the last 120 years one finds an average growth rate ( of about 1.8% (see table 1). Using data in [47], [27] and the UN Human Devel- opment Report 2000 [50] the following table has been compiled from a more detailed table. 1850 1998 Ctry. GNP GNP 9 e n% 9 p% c% 7% SVSL UK 3109 23500 15140 77 0.23 0.19 0.5 1.3 1.3 3.110“ US 1886 34260 22030 78 0.90 0.22 1.3 1.8 2.3 3.910® F 1840 24470 14660 78 0.37 0.17 0.7 1.9 1.9 3.310® S 1394 23770 12620 79 0.02 0.18 0.3 1.9 1.6 2.710® D 1400 25010 14460 77 0.27 0.17 0.6 1.9 1.9 3.310® AUS 4027 25370 15750 80 0.99 0.21 0.7 1.2 1.9 3.310® J 969 26460 15960 80 0.17 0.20 1.2 2.7 2.3 2.8-10® Table 1: Social indices for some developed industrial countries (all monetary values are in US$, 1998) It is noted that economic growth the first half of the last century was substantially below average while the second half was well above average. The above considerations can at least define the range of interest rates to be used in long term public investments into life saving operations. For the discount rates to be used in long term public investments the growth theory established by Solow [48] is applied, i.e. n + C(1 - 6) < p < 7 < 7max <n-h€( (51) where e = 1—q the so-called elasticity of marginal consumption (income). There is much debate about interest rates for long term public invest- ments, especially if sustainability aspects are concerned. But there is an important mathematical result which may guide our choice. Weitzman [54] and others showed that the far-distant future should be discounted at the lowest possible rate > 0 if there are different possible scenarios each with a given probability of being true. 7. A One-Level Optimization for Structural Components Let us now turn to the technical aspects of optimization. Cost-benefit optimization according to eq. (3) or (10) in principle requires two levels of optimization, one to minimize cost and the other to solve the reliability Risk Control and Optimization for Structural Facilities 161 of optimization, one to minimize cost and the other to solve the reliability problem. However, it is possible to reduce it to one level by adding the Kuhn- Tucker condition of the reliability problem to the cost optimization task provided that the reliability task is formulated in the transformed standard space. For the task in eq. (3) we have Maximize: Z(p) = B* - C{p) - (C7(p) + Hm + Hp) • Subject to: g{u,p) = 0 Wj||Vui?(u,p)|| + Vu5(u,p)i||u|| =0; i = 1 hk(p) <0,k = VpC(p) > kSVSLNp^VpPf(p) (52) where the first and second condition represent the Kuhn- Tucker condi- tion for a valid ’’critical” point, the third condition some restrictions on the parameter vector p and the forth condition the human life crite- rion in eq. (48). Frequently, the term in the objective can be replaced by P/(p). The failure probability is Pf{p) ^ ^{-p{p))CsoRM (53) and we have to require that ||u|| ^ 0 and || Vu^(u, p)|| ^ 0. It is assumed that the second-order correction CsoRM is nearly independent of p. In fact, at the expense of some more numerical effort, one can use any update of the first-order result $(— ^(p)), for example an update by importance sampling provided that the result of importance sampling is formulated as a correction factor to the first-order result. VpC(p) usually must be determined numerically. For time- variant problems as in eq. (10) one finds the outcrossing rate for a combination of rectangular wave and differentiable processes as: z/+(p) E Aj#(— /3) + Wo CsORM (54) The optimization task is 162 Minimize: Z{p) = ^ ~ C'(p) - (C'(p) + Hm + Hp) ■ Subject to: 5(u,p) = 0 Wj||Vuff(u,p)|| + Vu5(u,p)i||u|| = 0; i = - 1 hkip) <0,k = l,...,q VpC(p) > kWm:NF^VpU+{p) (55a) For the case in eq. (15) one replaces by and Vpi^'*‘(p) by VpXPfip). The optimization tasks in eq. (52) or in (55a) are conveniently per- formed by suitable SQP-algorithms (for example, [44], [33]). For both formulations eq. (52) and (55a), respectively, gradient-based optimiz- ers require the gradients of the objective as well as the gradients of all constraints. This means that second derivatives are required in order to calculate the gradient of second condition as well as of the human value criterion, in particular, the entries into the Hessian of ^(u, p). This is also the most serious objection against this form of a one level ap- proach. One can, however, proceed iteratively for well-behaved failure surfaces. Initially, one assumes a linear or linearized failure surface and sets C^soRM ~ Then, all entries are zero. After a first solution of problem (52) or (55a) one determines the Hessian once in the solution point (u*(^),p(i)) and with it also calculates C^soRM' Problems (52) or (55a) are then solved a second time with fixed Hessian G(u*(^), p(^)) and so forth. This schemes is repeated until convergence is reached which usually is after a few steps. From a practical point of view it is fre- quently sufficient to use first-order reliability results and no iteration is necessary. In closing this section it is important to note that the optimization tasks as formulated in eq. (52) and (55a) are among the easiest one can think of. In practice safety related design decisions additionally include changes in the lay-out, in the structural system or in the main- tenance strategy. Optimization is over discrete sets of design alterna- tives. Clearly, this is more difficult and very little is known how to do it formally except in a heuristic, empirical manner in small dimensions. 8. Example As an example we take a rather simple case of a system where failure is defined if the random resistance or capacity is exceeded by the random Risk Control and Optimization for Structural Facilities 163 demand, i.e. the failure event is defined as F = {i? — S{t) < 0}. The demand is modelled as a one-dimensional, stationary marked Poissonian renewal process of disturbances (earthquakes, wind storms, explosions, etc.) with stationary renewal rate A and random, independent sizes of the disturbances Si^i = 1,2, Random resistance is log-normally distributed with mean p and a coefficient of variation Vr. The distur- bances are also independently log-normally distributed with mean equal to unity and coefficient of variation Vr. A disturbance causes failure with probability: Pfip) = $ ( \ \ ^ln((l + F?)(l + V|)) (56) Thus, the failure rate is XPf{p) and the Laplace transform of the renewal density is: (57) 7 An appropriate objective function given systematic reconstruction then is Zjp) _ b Co iCo -S'- which is to be maximized. The criterion (62) has the form: Some more or less realistic, typical parameter assumptions are: (7o == 10 ®, Cl = 10^ a = 1.25, Hm = 3- Co, Vr = 0.2, Ks = 0.3, and A = 1 [1 / year] . The socio-economic demographic data are e = 77, GDP = 25000, g = 15000, w = 0. 15, Nr = 100, A: = 0.1 so that Hr = HC kNp = 5.8 ■ 10® and SVSLkNp = 3.3 • 10^. The value of Nr is chosen relatively large for demonstration purposes. Monetary values are in US$. Optimization is performed for the public and for the owner separately. For the public bs = 0.02Co and 75 = 0.0185 are chosen. Also, we take ;^ = 1 for simplicity. In particular, benefit and discount rate are chosen such that the public does not make direct profit from an economic activity of its members. Optimization including the cost Hr gives p*g = 4.35, the corresponding failure rate is 1.2 • 10“®. Criterion (48) is already fulfilled for pi = 3.48 corresponding to a yearly failure rate of 1.6 • 10“"^ but Zs{pi)/Co being already negative. It is interesting to see that in this 164 case the public can do better in adopting the optimal solution rather than just realizing the facility at its acceptability limit as pointed out already earlier. The owner uses some typical values of bo = 0.07(7o and 70 = 0.05 and does or does not include life saving cost. If he includes life saving cost the objective function is shifted to the right (dashed line). The calculations yield Pq — 3.76 and Pq = 4.03, respectively, and the corresponding failure rates are 7.1 • 10“^ and 3.2 • 10“^. The acceptability criterion limits the owner’s region for reasonable designs. Inclusion of life saving cost has relatively little influence on the position of the optimum. Figure 3. Objective function of owner and society It is noted that the stochastic model and the variability of capacity and demand also play an important role for the magnitude and location of the optimum as well as the acceptability limit. The specific marginal cost (rate of change) of a safety measure and its effect on a reduction of the failure rate are equally important. 9. Conclusions Optimization techniques are essential ingredients of reliability-oriented optimal designs of technical facilities. Although many technical aspects are not yet solved and the available spectrum of models and methods Risk Control and Optimization for Structural Facilities 165 in structural reliability is still limited many practical problems can be solved. A special one-level optimization is proposed for general cost- benefit analysis. In this paper, however, focus is on some more critical issues, for example, ’’what is a reasonable replacement strategy for struc- tural facilities?”, ”how safe is safe enough?” and ”how to discount losses of material, opportunity and human lives?” . An attempt has been made to give at least partial answers. Only if those issues have an answer overall optimization of technical facilities with respect to cost makes sense. References [1] point algorithms for large variable problems in time-invariant and time- variant reliability, Proc. 3rd IFIP WG 7.5 Working Conference, Berkeley, 1990, pp. 1-12, Springer, Berlin, 1990 [2] Arthur, W.B., The Economics of Risks to Life, American Economic Review, 71, pp. 54-64, 1981 [3] Belyaev, Y. K., On the Number of Exits across the Boundary of a Region by a Vector Stochastic Process, Theor. Prob. Appl., 1968, 13, pp. 320-324 [4] Bleistein, N., Handelsman, R.A., Asymptotic Expansions of Integrals, Holt, Rinehart and Winston, New York, 1975 [5] Breitung, K., Asymptotic Approximations for Multinormal Integrals, Journ. of the Eng. Mech. Div., 110, N3, 1984, pp. 357-366 [6] Breitung, K., Asymptotic Approximations for Probability Integrals, Prob. Eng. Mech., 1989, 4, 4, pp. 187-190 [7] Breitung, K., Asymptotic Crossing Rates for Stationary Gaussian Vector Pro- cesses, Stochastic Processes and their Applications, 29, 1988, pp. 195-207 [8] Breitung, K., Asymptotic Approximations for the Crossing Rates of Poisson Square Waves, Proc. of the Conf. on Extreme Value Theory and Applications, Gaithersburg/Maryland, NIST Special Publication 866, 3, 1993, pp. 75-80 1 [9] Breitung, K., Rackwitz, R., Nonlinear Combination of Load Processes, Journ. of Struct. Mech., 10, 2, 1982, pp. 145-166 [10] Cantril, H., The Pattern of Human Concerns, New Brunswik, N.J., Rutgers University Press, 1965 [11] Cox, D.R., Renewal Theory, Methuen, 1962 [12] Cox, D.R., Isham, V., Point Processes, Chapman Sz Hall, London, 1980 [13] Cramer, H., Leadbetter, M.R., Stationary and Related Stochastic Processes. Wiley, New York, 1967 [14] Grigoriu, M., Crossings of Non-Gaussian Translation Processes, Journal of the Engineering Mechanics Division, ASCE, 110, EM4, 1984, pp. 610-620 [15] Cropper, M.L., Sussman, F.G., Valuing Future Risks to Life, Journ. Environ- mental Economics and Management, 19, pp. 160-174, 1990 [16] Hasofer, A.M., Lind, N.C., An Exact and Invariant First Order Reliability For- mat, Journ. of Eng. Mech. Div., ASCE, 100, EMI, 1974, pp. 111-121 166 [17] Hasofer, A.M., Design for Infrequent Overloads, Earthquake Eng. and Struct. Dynamics, 2, 4, 1974, pp. 387-388 [18] Hasofer, A.M., Rackwitz, R., Time-dependent models for code optimization, Proc. ICASP’99, (ed. R.E. Melchers &: M. G. Stewart), Balkema, Rotterdam, 2000, 1, pp. 151-158 [19] Hohenbichler, M., Rackwitz, R., Non-Normal Dependent Vectors in Structural Safety, Journ. of the Eng. Mech. Div., ASCE, 107, 6, 1981, pp. 1227-1249 [20] Hohenbichler, M., Gollwitzer, S., Kruse, W., Rackwitz, R., New Light on First- and Second-Order Reliability Methods, Structural Safety, 4, pp. 267-284, 1987 [21] Hohenbichler, M.; Rackwitz,.R.: Sensitivity and Importance Measures in Struc- tural Reliability, Civil Engineering Systems, 3, 4, 1986, pp 203-209 [22] Iwankiewicz, R., Rackwitz, R., Non-stationary and stationary coincidence prob- abilities for intermittent pulse load processes. Probabilistic Engineering Mechan- ics, 2000, 15, pp. 155-167 [23] Lind, N.C., Target Reliabilities from Social Indicators, Proc. ICOSSAR93, Balkema, 1994, pp. 1897-1904 [24] Lutter, R., Morrall, J.F., Health-Health Analysis, A New Way to Evaluate Health and Safety Regulation, Journ. Risk and Uncertainty, 8, pp. 43-66, 1994 [25] Kapteyn, A., Teppa, F., Hypothetical Intertemporal Consumption Choices, Working paper, CentER, Tilburg University, Netherlands, 2002 [26] Kuschel, N., Rackwitz, R., Two Basic Problems in Reliability-Bfised Structural Optimization, Mathematical Methods of Operations Research, 46, 1997, 309-333 [27] Maddison, A., Monitoring the World Economy 1820-1992, OECD, Paris, 1995 [28] Madsen, H.O., Lind, N., Krenk, S., Methods of Structural Safety, Prentice-Hall, Englewood Cliffs, 1987 [29] Nathwani, J.S., Lind, N.C., Pandey, M.D., Affordable Safety by Choice: The Life Quality Method, Institute for Risk Research, University of Waterloo, Waterloo, Canada, 1997 [30] Paez, A., Torroja, E., La determinacion del coefficiente de seguridad en las distintas obras. Institute Tecnico de la Construccion y del Cemento, Madrid, 1952 [31] Pandey, M.D., Nathwani, J.S., Canada Wide Standard for Particulate Matter and Ozone: Cost-Benefit Analysis using a Life-Quality Index, to be published in Journ. Risk Analysis, 2002 [32] Pate-Cornell, M.E., Discounting in Risk Analysis: Capital vs. Human Safety, Proc. Symp. Structural Technology and Risk, University of Waterloo Press, Waterloo, ON, 1984 [33] The Linearization Method for Constrained Optimization, Springer, Berlin, 1994 [34] Rackwitz, R., Reliability of Systems under Renewal Pulse Loading, Journ. of Eng. Mech., ASCE, 111, 9, 1985, pp. 1175-1184 [35] Rackwitz, R., On the Combination of Non-stationary Rectangular Wave Re- newal Processes, Structural Safety, 13, 1-1-2, 1993, pp 21-28 [36] Rackwitz, R., Optimization - The Basis of Code Making and Reliability Verifi- cation, Structural Safety, 22, 1, 2000, pp. 27-60 Risk Control and Optimization for Structural Facilities 167 [37] Rackwitz, R., Optimization and Risk Acceptability based on the Life Quality Index, Structural Safety, 24, pp. 297-331, 2002 [38] Rice, S.O., Mathematical Analysis of Random Noise, Bell System Tech. Journ., 32, 1944, pp. 282 and 25, 1945, pp. 46 [39] Rosen, S., The Value of Changes in Life Expectancy, Journ. Risk and Uncer- tainty, 1, pp. 285-304, 1988 [40] Rosenblueth, E., Optimum Design for Infrequent Disturbances, Journ, Struct. Div., ASCE, 102, ST9, 1976, pp. 1807-1825 [41] Rosenblueth, E., Esteva, L., Reliability Basis for some Mexican Codes, in: ACI Spec. Publ., SP-31, Detroit, 1972 [42] Rosenblueth, E., Mendoza, E., Reliability Optimization in Isostatic Structures, Journ. Eng. Mech. Div., ASCE, 97, EM6, 1971, pp. 1625-1642 [43] Shepard, D.S., Zeckhauser, R.J., Survival versus Consumption, Management Science, 30, 4, pp. 423-439, 1984 [44] Schittkowski, K., Theory, Implementation, and Test of a Nonlinear Program- ming Algorithm. In: Eschenauer, H., Olhoff, N. (eds.). Optimization Methods in Structural Design, Proc. Euromech Colloquium 164, Universitat Siegen, Oct. 12-14, 1982, Zurich 1983 [45] Schrupp, K., Rackwitz, R., Outcrossing Rates of Marked Poisson Cluster Pro- cesses in Structural Reliability, Appl. Math. Modelling, 12, 1988, Oct., 482-490 [46] Shinozuka, M., Stochastic Characterization of Loads and Load Combinations, Proc. 3rd ICOSSAR, Trondheim 32-25 June, 1981, Structural Safety and Reli- ability, T. Moan and M. Shinozuka (Eds.), Elsevier, Amsterdam, 1981 [47] Steckel, R.H., Floud, R., Health and Welfare during Industrialization, University of Chicago Press, Chicago, 1997 [48] Solow, R.M., Growth Theory, Clarendon Press, Oxford, 1970 [49] Tengs, T.O., Adams, M.E., Pliskin, J.S., Safran, D.G., Siegel, J.E., Weinstein, M.C., Graham, J.D., Five- Hundred Life-Saving Interventions and Their Cost- Effectiveness, Risk Analysis, 15, 3, pp. 369-390, 1995 [50] United Nations, HDR 2000, http://www.undp.Org/hdr2000/english/HDR2000.html [51] Veneziano, D., Grigoriu, M., Cornell, C.A., Vector-Process Models for System Reliability, Journ. of Eng. Mech. Div., ASCE, 103, EM 3, 1977, pp. 441-460 [52] Viscusi, W.K., The Valuation of Risks to Life and Health, Journ. Economic Literature, XXXI, pp. 1912-1946, 1993 [53] Viscusi, W.K., Discounting health effects on medical decision, in: Valuing Health Care, Costs, benefits and effectiveness of pharmaceuticals and other medical technologies, F.A. Sloan (ed), Cambridge University Press, pp. 125-147, 1996 [54] Weitzman, M.L., Why the Far-Distant Future Should Be Discounted at Its Low- est Possible Rate, Journal of Environmental Economics and Management, 36, pp. 201-208, 1998 [55] Wen, Y.K., A Clustering Model for Correlated Load Processes, Journ. of the Struct. Div., ASCE, 107, ST5, 1981, pp. 965-983 PROBABILITY OBJECTIVES IN STOCHASTIC PROGRAMS WITH RECOURSE Riidiger Schultz Institute of Mathematics University Duisburg-Essen Lotharstr. 65 D-47048 Duisburg, Germany schultz@math.uni-duisburg.de Abstract Traditional models in multistage stochastic programming are directed to minimizing the expected value of random optimal costs arising in a mul- tistage, non-anticipative decision process under uncertainty. Motivated by risk aversion, we consider minimization of the probability that the random optimal costs exceed some preselected threshold value. For the two-stage case, we analyse structural properties and propose algorithms both for models with integer decisions and for those without. Extension of the modeling to the multistage situation concludes the paper. Keywords: Stochastic programming, mixed-integer optimization. 1. Introduction Stochastic programs with recourse arise as deterministic equivalents to random optimization problems. In the present paper the main accent will be placed at the two-stage situation, and the most general random optimization problems to be considered are random mixed-integer linear programs. These are accompanied by a two-stage scheme of alternating decision and observation. After having decided on parts of the variables in a first stage, the random data infecting the problem are observed, and in turn the remaining (second-stage or recourse) variables are fixed. In our present analysis two basic assumptions underly this scheme. First, and naturally, the first-stage decision has to be taken on a “here-and- now” basis, i.e., it must not depend on (or anticipate) the outcome of the random data. Secondly, and providing some modeling restriction, the first-stage decision does not influence the probability distribution of 169 170 the random data. In multistage stochastic programs the above two-stage scheme is ex- tended into a finite horizon sequential decision process under uncertainty. Again we have to maintain nonanticipativity of decisions, and, so far, almost all results concern problems where the decisions do not infiuence the probability distribution of the random data. In the final section of the present paper we will return to multistage stochastic programs. After having sketched the rules for how to make decisions, let us now discuss criteria for how to select a “best” decision. In this respect, the existing literature on stochastic programs with recourse (cf. the text- books [5, 15, 20] and the references therein) almost unanimously suggests to start out from expectations of objective function values of the ran- dom optimization problem. For two-stage models (in a cost minimiza- tion framework) this implies that the deterministic first-stage decision is selected such that the expectation of the sum of the deterministic first- stage costs and the random second-stage costs (induced by the random data and an optimal second-stage decision) becomes minimal. Such a criterion has proven useful in many applications. In case the random optimization problem is a linear program without integer requirements, the resulting stochastic program with recourse enjoys convexity in the first-stage variables. This enabled application of powerful tools from convex analysis, both for structural investigations and algorithm design (cf. [4, 5, 15, 20, 32]). In the present paper, we will discuss recourse stochastic programs where the optimization is based on minimizing the probability that the above sum of deterministic and random costs exceeds a given threshold value. Such models provide an opportunity to address risk aversion in the framework of recourse stochastic programming. The proposal to replace the usual expectation-based objective function in recourse stochastic programming by a probability objective seemingly dates back to Bereanu [2] and, hitherto, has not been elaborated in much detail. Reformulating the stochastic program by adding another variable and including level sets of the objective into the constraints leads to a chance constrained stochastic program which is nonconvex in general. We will see that, along this line, some structural knowledge on chance constraints (cf. [5, 15, 16, 20, 29]) reappears in the structural analy- sis of our models. Algorithmically, we will view several well-established techniques from a fresh perspective. Among them there are cutting planes from convex subgradient optimization, Lagrangian relaxation of mixed-integer programs, and decomposition techniques for block-angular stochastic programs. The paper is organized as follows. In Section 2 we formalize the mod- Probability Objectives in Stochastic Programs with Recourse 171 eling outlined above, collect some prerequisites, and compare with the usual expectation-based modeling in recourse stochastic programming. Section 3 is devoted to structural results. In Section 4 we present some first algorithmic approaches. Separate attention is paid to models with- out integer decisions since they allow for an algorithmic shortcut. As already announced, the final section will discuss the extension of our modeling to multistage stochastic programs. 2. Modeling Consider the following random mixed-integer linear program min c^x + (^y + s.t. Tx + Wy + W'y' - (1) X G X, y G , y' G IR^ We assume that all ingredients above have conformal dimensions, that W, W' are rational matrices, and that X C is a nonempty closed polyhedron. Integer requirements to components of x are formally pos- sible but will not be imposed for ease of exposition. For the same reason, randomness is kept as simple as possible by claiming that only the right- hand side h{u) G is random, i.e., a random vector on some proba- bility space {Pt^A^F). Decision variables are divided into two groups: first-stage variables x to be fixed before and second-stage variables (y, y') to be fixed after observation of h{uj). Let us denote m ■= mm{q^y + q'^y' : Wy + W'y' = t, y e Zf, y' € R^'}. (2) According to integer programming theory ([19]), this function is real- valued on provided that W{Z!^) -f and {u G : W^u < y, < y'} 0 which, therefore, will be assumed through- out. The classical expectation-based stochastic program with recourse now is the optimization problem min^J {c^x -\- ^{h{uj) — Tx)) F{d(jj) : a; G x|. (3) The recourse stochastic program with probability objective reads min|jP({a; G ft : c^x + — Tx) > ipo}) : rr G x| (4) where (po E F denotes some preselected threshold (some ruin level in a cost framework, for instance). For convenience, we will call (3) the 172 expectation-based and (4) the probability-based recourse model. In do- ing so, we are well aware of the fact that, of course, (4) is expectation- based too, if probabilities are understood as expectations of indicator functions. We will see in a moment, that both (3) and (4) are well-defined non- linear optimization problems. Their objective functions are denoted by Q]e{x) and Qjp{x)^ respectively. To detect their structure, the function $ is crucial, which arises as a value function of a mixed-integer linear program. From parametric optimization ([1, 6]) the following is known Proposition 2.1 Assume that W{Zf) + and {u e iR« : W'^u < q, W'^u < q'} ^ 0. Then it holds (i) $ is real-valued and lower semicontinuous on IR^ , (a) there exists a countable partition = U^iTi such that the restrictions o/$ toTi piecewise linear and Lipschitz continuous with a uniform constant L > 0 not depending on i, (Hi) each of the sets % has a representation Ti = {U + lC} \ U^^i{tij + K] where 1C denotes the polyhedral cone W'{1R^') and U^tij are suitable points from IR^ , moreover, N does not depend on i, (iv) there exist positive constants ^,7 such that — $(^2)! < /3\\ti — ^ 2 II +7 whenever t\,t 2 G . In case m = 0, i.e., if there are no integer requirements in the sec- ond stage, $ becomes the value function of a linear program. Under the assumptions of Proposition 2.1, $ is real- valued on IR^. By linear programming duality it is convex, piecewise linear, and adopts a repre- sentation = max djt where . . . ,dj are the vertices of {u G : W'^u < q'}, which is a compact set in this case. As an immediate conclusion we obtain, that, without integer require- ments in the second stage, 1 — Qip{x) coincides with the probability of a closed polyhedron, providing a direct link to chance constrained stochas- tic programming ([5, 15, 20]). Before we will turn our attention to Qip{x), we review some properties of Qie{x)- For convenience we denote by p the image measure IP o h~^ on . Without integer requirements (m = 0), convexity of $ extends to Qje under mild conditions. A standard result of stochastic linear programming reads Probability Objectives in Stochastic Programs with Recourse 173 Proposition 2.2 Assume fh = 0, = R\{ueM^ : W^u < q'} ^ 0, and ||/j|| n{dh) < oo. Then Qje : ^ R is a real-valued convex function. As already mentioned in the introduction, convexity has been ex- ploited extensively in stochastic linear programming. For further read- ing we refer to the textbooks [5, 15, 20]. The remaining models, both expectation- and probability-based, to be discussed in the present paper enjoy convexity merely in exceptional situations. Straightforward exam- ples (cf. e.g. [35]) confirm that convexity in (3) is lost already for very simple models as soon as integer requirements enter the second stage. In [33] the following is shown. Proposition 2.3 Assume that W{2Zf) + W'{1R^') - {u G : W^u < W'^u < q'} ^ 0, and fj^s ||/i|| hi{dh) < oo. Then it holds (i) Qje ‘ ^ M is a real-valued lower semicontinuous function, (a) if la has a density, then Qje is continuous on IRJ^. 3. Structure To analyse the structure of Qip we introduce the notation M{x) ~ {helR^ : c^x + ^{h - Tx) > ipo], x G By Y\miidx^-^xM{xri) and limsup^,^^^, we denote the (set theo- retic) limes inferior and limes superior, i.e., the set of all points belonging to all but a finite number of the sets M{xn)^ n G W, and to infinitely many of the sets M{xn), respectively. Moreover, we denote Me(x) := {h G JR^ : c^x + ^(h — Tx) = cpo}, Md(x) ~ {h e ^ is discontinuous at h — Tx}. Note that, by Proposition 2.1, both Me{x) and Md{x) are measurable sets for all x G IR^. Lemma 3.1 For all x G IR^ there holds M{x) C liminfM(a;^) C limsupM(a;^) C M{x) U Me{x) U Md{x). Xn^X Xn^X Proof: Let h G M{x). The lower semicontinuity of $ (Proposi- tion 2.1) yields limini{c^Xn -\- ^{h — Txji)) > c^x + ^{h — Tx) > (po. Xn ^X 174 Therefore, there exists an rio € iV such that c^Xn + ^{h — Tx^) > ipo for all n > no, implying h € M{xn) for all n > Uq- Hence, M{x) C liminfa;„_>a;M(x„). Let h G limsupa-^^a; M{xn) \ M{x). Then there exists an infinite subset W of IV such that c^rr„ + $(h — Txn) > e N and (Fx + $(h - Tx) < (po- Now two cases are possible. First, $ is continuous at h — Tx. Passing to the limit in the first inequality then yields that c^x + $(/i — Tx) > (po, and h G Mg{x). Secondly, $ is discontinuous at h — Tx. In other words, h G Md{x). □ Proposition 3.2 Assume that W{Zf) + W{]R^') = IR^ and {u G iR" : W'^u < q, W'^u < q'} / 0. Then Qjp : ^ M is a real- valued lower semicontinuous function. If in addition /a{Me{x) U Md{x)) = 0; then Qjp is continuous at x. Proof: The lower semicontinuity of $ ensures that M{x) is measur- able for all X G and hence Qip is real- valued on IR^. By Lemma 3.1 and the (semi-) continuity of the probability measure on sequences of sets we have for all x G IR^ Qip{x) = fi{M{x)) < ia{limmf M{xn)) < Xjx yx < \immi a(Mixri)) = limmfQjp(xn)', Xn^X Xn^X establishing the asserted lower semicontinuity. In case p{Mg{x) yjMd{x)) = 0 this argument extends as follows Q]p{x.) = p{M{x)) = IJ,{M (x) li Me{x)li Md{x)) > > /i(limsupM(s„)) > limsup//(M(a;„)) = limsup<5jp(a;„), Xn~^X Xn~^X Xn^X and Q IP is continuous at x. □ Proposition 2.1 now reveals that, for given x G IR^^ both Me{x) and Md{x) are contained in a countable union of hyperplanes. The latter being of Lebesgue measure zero we obtain that ii{M^{x) U Md{x)) = 0 is valid for all x G IR^ provided that fj, has a density. This proves Probability Objectives in Stochastic Programs with Recourse 175 Conclusion 3.3 Assume that + W'{]R!jf) = IR^ , {u € : W^u < q, W'^ q'} 0; that !JL has a density. Then Qjp is continuous on IR^. This analysis can be extended towards Lipschitz continuity of Q ip . In [36], Tiedemann has shown Proposition 3.4 Assume that g, g' are rational vectors, W{Z'^)-\- VF'(iR!p') = IR^ , {u G IR^ : W^u < g, W'^^u < g'} ^ 0, and that for any nonsingular linear transformation B G L{M^,1R^) all one- dimensional marginal distributions of ji o B have bounded densities which, outside some bounded interval, are monotonically decreasing with growing abso- lute value of the argument. Then Qjp is Lipschitz continuous on any bounded subset of IR^. From numerical viewpoint, the optimization problems (3) and (4) pose the major difficulty that their objective functions are given by multidi- mensional integrals with implicit integrands. If h{u) follows a continuous probability distribution the computation of Qjp and Qip has to rely on approximations. Here, it is quite common to approximate the probabil- ity distribution of h{uj) by discrete distributions, turning the integrals in (3), (4) into sums this way. In the next section we will see that discrete distributions, despite the poor analytical properties they imply for Qjp and Qip^ are quite attractive algorithmically, since they allow for integer programming techniques. Approximating the underlying probability measures in (3) and (4) raises the question whether “small” perturbations in the measures result in only “small” perturbations of optimal values and optimal solutions. Subjective assumptions and incomplete knowledge on = IP o h~^ in many practical modeling situations provide further motivation for ask- ing this question. Therefore, stability analysis has gained some interest in stochastic programming (for surveys see [9, 35]). For the models (3) and (4) qualitative and quantitative continuity ofQip, Qip jointly in the decision variable x and the probability measure p be- comes a key issue then. Once established, the continuity, together with well-known techniques from parametric optimization, lead to stability in the spirit sketched above. In the present paper, we will not pursue stability analysis, but show how to arrive at qualitative joint continuity of Qip . For continuity results on Q ip refer to [14, 24, 30, 33, 34], for extensions towards stability to [35] and the references therein. For the rest of this section, we consider as a function mapping from ^ p(jR-5j to IR. By V{IR^) we denote the set of all Borel probability measures on IR^ . While IR^ is equipped with the usual topology, the set 176 V{Mf) is endowed with weak convergence of probability measures. This has proven both sufficiently general to cover relevant applications and sufficiently specific to enable substantial statements. A sequence {/^n} in V{]R^) is said to converge weakly to /i G written Hn /i, if for any bounded continuous function g : -> JR we have / / giOfJ'idO as n-^oo. (5) J]R^ JlR^ A basic reference for weak convergence of probability measures is Billings- ley’s book [3]. Proposition 3.5 Assume that W{Zf) + = IR^ and [u € ]R^ : < q, W'^u < q'} ^ 0. Let n G V{M^) be such that n{Me{x) U Md{x)) = 0. Then Qp : x V{M^) M is continuous at (x, ji). Proof: Let — > x and fiji 1^ be arbitrary sequences. By XniX • ^ {O 5 1 } we denote the indicator functions of the sets M{xn)^ M{x)^n G IN. In addition, we introduce the exceptional set E {/i G JR*^ : -> such that Xn(^n) 7 ^ x(^)}- Now we have E C Me{x) U Md{x). To see this, assume that h G {Me{x)U Md{x)y = {Me{x)yn{Md{x)y where the superscript c denotes the set- theoretic complement. Then $ is continuous at /i — Tx^ and either c^x + $(/i — Tx) > cpo or c^x + $(/i — Tx) < ipo- Thus, for any sequence hn h there exists a.n Uq ^ IN such that for all n > Uq either c^Xn + ^{hn — Txn) > or — Txn) < (fo- Hence, Xn{hn) -> x{h) as hn h, implying h G E^. In view of C Me{x) U Md{x) and /j,{Me{x) U Md{x)) = 0 we obtain that ia{E) = 0. A theorem on weak convergence of image measures attributed to Rubin in [3], p. 34, now yields that the weak convergence firi implies the weak convergence g^n^Xn^ M ° Note that o Xn^it^ ° ^ ^ are probability measures on {0, 1}. Their weak convergence then particularly implies that gn°XnHm) ^ In other words, jdn{M{xn)) — g,{M{x)) or Qjp{xn,lin) — ^ Qip{x,ijl). □ As done for the expectation-based model (3) in [33], continuity of optimal values and upper semicontinuity of optimal solution sets of the probability-based model (4) can be derived from Proposition 3.5. Probability Objectives in Stochastic Programs with Recourse 177 Remark 3.6 (probability-based model without integer decisions) Without integer second- stage variables the set Md{x) is always empty, and Propositions 3.2 and 3.5 readily specify. A direct approach to these models including stability analysis and algorithmic techniques has been carried out in [23]. Lower semicontinuity of Qjp in the absence of integer variables can already be derived from Proposition 3.1 in [29], a statement concerning chance constrained stochastic programs. Some early work on continuity properties of general probability functionals has been done by Raik ([21, 22], see also [16, 20]). 4. Algorithms In the present section we will review two algorithms for solving the probability-based recourse problem (4) provided the underlying measure p is discrete, say with realizations hj and probabilities TTj,j = 1, . . . , J. The algorithms were first proposed in [23] and [36], respectively, where further details can be found. 4.1 Linear Recourse We assume that there are no integer requirements to second-stage variables which is usually referred to as linear recourse in the literature. Suppose that p is the above discrete measure and consider problem (4) with := min{q^y : Wy >t,ye IR^ }. (6) For ease of exposition let X C IR^ be a nonempty compact polyhedron. Let e G denote the vector of all ones and consider the set D ~ {{u,Uo) E : 0 <u < e, 0 <Uq < 1, W'^u — UqQ < 0} together with its extreme points {dk,dko)’>k = 1,...,FT. Furthermore, consider the indicator function h G M{x) otherwise. (7) The key idea of the subsequent algorithm is to represent x by 9' binary variable and a number of optimality cuts which enables exploitation of cutting plane techniques from convex subgradient optimization. The latter have proven very useful in classical two-stage linear stochastic programming, see e.g. [4, 32]. 178 Lemma 4.1 There exists a sufficiently large constant Mq > 0 such that problem (4) can be equivalently restated as J : {hj - Tx)^ + {c^ x - ipo)dko < Mo9j, xex, 9je{o,i}, k = i,...,K, j = i,...,J}.(8) Proof: For any x e X and any j G {1, . . . , J} consider the feasibility problem min {e^t-\-to : Wy+t>hj—Tx^q^y — to<ipo — c^x} (9) and its linear programming dual mdiyi{{hj—Tx)^u+{c^x—ipo)uo : 0 < < e, 0 < iXo < 1, W^u—Uoq < 0}. Clearly, both programs are always solvable. Their optimal value is equal to zero, if and only if = 0. In addition, D coincides with the feasible set of the dual. If Mq is selected as Mo max {{hj - Tx)^^dk + {c^x - (fo)dko}, then, for any a; G X, the vector (x, 9) with = 1, j = 1, . . . , J is feasible for (8). If x(a;, hj) = 1 for some x E X and j G {1, . . . , J}, then there has to exist some /j G {1, . . . ^K} such that {hj - Tx)^dk + {c^x - (fo)dko > 0. Hence, given a; G X, — 0 is feasible in (8) if and only if hj) == 0. Therefore, (8) is equivalent to mm{J2j^i7rjx{xffij) ’ x G X}. □ The algorithm progresses by sequentially solving a master problem and adding violated optimality cuts generated through the solution of subproblems (9). These cuts correspond to constraints in (8). Assuming that the cuts generated before iteration u correspond to subsets C {1, . . . , X} the current master problem reads J min nj9j : {hj—Tx)'^dk + {c^x — (fo)dko ^ Mo9j, xex, eje{o,i}, keJCu, (lo) The full algorithm proceeds as follows. Probability Objectives in Stochastic Programs with Recourse 179 Algorithm 4.2 Step 1 (Initialization) : Set y — and /Co = 0 . Step 2 (Solving the master problem): Solve the current master problem (10) and let be an optimal solution. Step 3 (Solving subproblems): Solve the feasibility problem (9) for x = x^ and all j G {1,..., J} such that Oj = 0. Consider the following situations: 1 If all these problems have optimal value equal to zero, then the current x^ is optimal for (8). 2 If some of these problems have optimal value strictly greater than zero, then, via the dual solutions, a subset {dk,dko)->k E 1C C {1, . . . , K} of extreme points of D is identified. The corresponding cuts are added to the master. Set /Ci,+i ~ ICylllC and y u + 1; go to Step 2. The algorithm terminates since D has a finite number of extreme points. For further details on correctness of the algorithm and first computational experiments we refer to [23]. 4.2 Linear Mixed-Integer Recourse In the present subsection we allow for integer requirements to second- stage variables. Again we assume that X C IRI^ is a nonempty compact polyhedron and that /i is the discrete measure introduced at the begin- ning of the present section. We consider problem (4) with ^{t) := mm{q^y : Wy > t, y e Y}. (11) For notational convenience we have integrated the former vector (y, y') into one vector y now varying in T x IR^ . Accordingly, the former (y, q') and (W, W') are integrated into q and W. To be consistent with Subsection 4.1 we have inequality constraints in (11). Lemma 4.3 There exists a sufficiently large constant M\ > 0 such that problem (4) can be equivalently restated as J min{y^ TTjOj : Wyj > hj — Tx, q^yj + c^x — (po ^ Mi6j, x,y,e x€X, yjEY, %6{0,1}, j = (12) Proof: We choose M\ by Ml := supjc^a; + ^{hj — Tx) : a; € X, j = 1, . . . , J}. 180 To see that this supremum is finite, recall the compactness of X and the general assumptions on $ in the paragraph following formula (2). Part (iv) of Proposition 2.1 then confirms that ^{hj — Tx) remains bounded if x and j vary over X and {1, . . . , J}, respectively. The selection of M\ guarantees that for any x ^ X and yj G Y such that Wy^ > hj - Tx the selection 9j = 1 is feasible. Given x, the selection 6j = 0 is feasible if and only if there exists a yj G Y fulfilling Wyj > hj — Tx and c^x + q^yj < (fo- The latter holds if and only if c^x + ^{hj — Tx) < (po which is equvalent to hj) = 0. This proves that (12) is equivalent to min{J2j=i hj) : x G X}. □ Compared with problem (8), problem (12) again arises by representing the indicator function x from (7) by a binary variable. Lacking duality, however, prevents the usage of optimality cuts such that minimization with respect to y has to be carried out explicitly in (12). Hence, (8) is a variant of (12) where the linear programming nature of the second stage enables an algorithmic shortcut. Problem (12) is a mixed-integer linear program that quickly becomes large-scale in practical applications. General purpose mixed-integer lin- ear programming algorithms and software fail in such situations. As an alternative, we present a decomposition method based on Lagrangian relaxation of nonanticipativity. This decomposition method for block- angular stochastic integer programs has been elaborated for the first time in [7] for the expectation-based model (3). Introduce in (12) copies Xj^j = 1, . . . , J, according to the number of scenarios, and add the nonanticipativity constraints x\ — . . . — xj {ox an equivalent system), for which we use the notation ~ 0 with proper (/, n)— matrices Hj^j = 1, . . . , J. Problem (12) then becomes min {y^ 'KjOj : Txj + Wyj > hj^ c^Xj + q^yj — Mi9j < (po^ x,y,e e X, yj e Y, 6j € {0, 1}, i - 1, . . . , J, ^ HjXj = 0}. (13) This formulation suggests Lagrangian relaxation of the interlinking constraints HjXj = 0. For X e we consider the functions Lj{xj,yj,0j,X) := njOj + HjXj, j = l,...,J, Probability Objectives in Stochastic Programs with Recourse and form the Lagrangian 181 J L{x,y,e,\) := i=i The Lagrangian dual of (13) then is the optimization problem max{D(A) : X e (14) where J D{\) = min{^L j {xj , yj , 9j ,\) \ Txj + Wyj > hj, i=i c^Xj + q^yj - MiOj < tpo, Xj 6 X, yj € y, 6j G {0, 1}, j = J}. For separability reasons we have D{\) = (15) where Dj (A) = min{Lj {xj , yj , Oj , A) : Txj + Wyj > hj , c^Xj + q^yj - Mi9j < ipo, (16) ^ Vj ^ ^ {O 5 !}}• D{\) being the pointwise minimum of affine functions in A, it is piece- wise affine and concave. Hence, (14) is a non-smooth concave maximiza- tion (or convex minimization) problem. Such problems can be tackled with advanced bundle methods, for instance with Kiwiel’s proximal bun- dle method NOA 3.0, [17, 18]. At each iteration, these methods require the objective value and one subgradient of D. The structure of D, cf. (15), enables substantial decomposition, since the single-scenario prob- lems (16) can be tackled separately. Their moderate size often allows application of general purpose mixed-integer linear programming codes. Altogether, the optimal value zlb of (14) provides a lower bound to the optimal value 2 : of problem (12). From integer programming ([19]) it is well-known, that in general one has to live with a positive duality gap. On the other hand, it holds that zl\d > zlp where zpp denotes the optimal value to the LP relaxation of (12). The lower bound obtained by the above procedure, hence, is never worse the bound obtained by 182 eliminating the integer requirements. In Lagrangian relaxation, the results of the dual optimization often provide starting points for heuristics to find promising feasible points. Our relaxed constraints being very simple {x\ = ... = a; at), ideas for such heuristics come up straightforwardly. For example, examine the components, j — 1, . . . , J, of solutions to (16) for optimal or nearly optimal A, and decide for the most frequent value arising or average and round if necessary. If the heuristic yields a feasible solution to (12), then the objective value of the latter provides an upper bound ^ for Together with the lower bound zlb this gives the quality certificate (gap) z — zld- The full algorithm improves this certificate by embedding the proce- dure described so far into a branch-and-bound scheme in the spirit of global optimization. Let V denote the list of current problems and ^LD = ^ld{P) the Lagrangian lower bound for P ^ V. The algorithm then proceeds as follows. Algorithm 4.4 Step 1 (Initialization): Set z — +oo and let V consist of problem (13). Step 2 (Termination): If V — ^ then the solution x that yielded z = Q]p{x), cf. (4), is optimal. Step 3 (Node selection): Select and delete a problem P from V and solve its Lagrangian dual. If the optimal value zld{P) hereof equals +00 (infeasibility of a subproblem) then go to Step 2. Step 4 (Bounding): If zld{P) > z go to Step 2 (this step can be carried out as soon as the value of the Lagrangian dual rises above z). Consider the following situations: 1 The scenario solutions Xj, j = 1, . . . , J, are identical: If Qjp{xj) < z then let z = Qip{xj) and delete from V all problems P' with zld{P') > Go to Step 2. 2 The scenario solutions Xj, ] = 1, . . . , J differ: Compute the average X = S]/=i '^j^j ond round it by some heuristic to obtain x^. If Qip{x^) < z then let z = Qjp{x^) and delete from V all problems P' with zld{P') > Go to Step 5. Step 5 (Branching): Select a component of x and add two new problems to V obtained from P by adding the constraints and + 1, respectively (if is an integer component), or X(^k) ^ and respectively, where e > Q is a tolerance parameter to have disjoint subdomains. Go to Step 3. The algorithm works both with and without integer requirements in the first stage. It is obviously finite in case X is bounded and all Probability Objectives in Stochastic Programs with Recourse 183 a;— components have to be integers. If x is mixed-integer (or continuous, as in the former presentation) some stopping criterion to avoid endless branching on the continuous components has to be employed. Some first computational experiments with Algorithm 4.4 are reported in [36]. 5. Multistage Extension The two-stage stochastic programs introduced in Section 2 are based on the assumptions that uncertainty is unveiled at once and that deci- sions subdivide into those before and those after unveiling uncertainty. Often, a more complex view is appropriate at this place. Multistage stochastic programs address the situation where uncertainty is unveiled stepwise with intermediate decisions. The modeling starts with a finite horizon sequential decision process un- der uncertainty where the decision xt G at stage t G {1,...,T} is based on information available up to time t only. Information is modeled as a discrete time stochastic process {Ct}T=i on some proba- bility space {Pt^A^F) with taking values in The random vector then reflects the information available up to time t. Nonant icipativity, i.e., the requirement that Xf must not depend on fu- ture information, is formalized by saying that xt is measurable with respect to the a— algebra At C A which is generated by t = 1, . . . , T. Clearly, At C At+i for alH = 1, . . . , T — 1. As in the two-stage case, the first-stage decision x\ usually is deterministic. Therefore, A\ = Moreover, we assume that At = A. The constraints of our multistage extensions can be subdivided into three groups. The first group comprises conditions on Xt arising from the individual time stages: Xt{uj) e Xt , Bt{Ct{i^))xt{uj) > ( 17 ) ^—almost surely, t — 1, . . . , T. Here, Xt C is a set whose convex hull is a polyhedron. In this way, integer requirements to components of xt are allowed for. For simplicity we assume that Xt is compact. The next group of constraints models linkage between different time stages: t AtTi^t{(^))xr{uj) > gti^t{(^)) iP-almost surely, (18) T = 1 Finally, there is the nonanticipativity of xt^ i. e., Xt is measurable with respect to t = 1, . . . , T. (19) 184 In addition to the constraints we have a linear objective function T t=l The matrices AtT{-)^Bt{.) as well as the right-hand sides dt{.)^gt{-) and the cost coefficients ct{.) all have conformal dimensions and depend affinely linearly on the relevant components of The decisions Xt are understood as members of the function spaces w4, iP; t The constraints (17), (18) then impose pointwise conditions on the whereas (19) imposes functional con- straints, in fact, membership in a linear subspace of x^iLoo(0,yl,iP; iR’^^), see e.g. [31] and the references therein. Now we are in the position to formulate the multistage extensions to the expectation- and probability-based stochastic programs (3) and (4), respectively. The multistage extension of (3) is the minimization of expected minimal costs subject to nonanticipativity of decisions: T f : (17), (18)}F(fl!a;) (20) X fulfilling (19) Jfl x{lj) To have the integral in the objective well-defined, the additional assump- tion ^ Li(Jl, .4., JP; iR^^), t = 1,...,T, is imposed in model (20), see [31] for further details. The multistage extension of (4) is the minimization of the probability that minimal costs do not exceed a preselected threshold (po G M. Again this minimization takes place over nonanticipative decisions only: T t ^ ^ = (1^)’ (18)} > <^o})(21) X fulfilling (19) \ x{cj) “ J J The minimization in the integrand of (20) being separable with respect to cj G ri, it is possible to interchange integration and minimization. Then the problem can be restated as follows: U T F{doj) : a; fulfilling (17), (18), (19)|. (22) 4=1 Extending the argument from Lemma 4.3 we introduce an additional variable 0 G L^oiO,, A, IP; {0,1}) as well as a sufficiently big constant 185 Probability Objectives in Stochastic Programs with Recourse M > 0. Then problem (21) can be equivalently rewritten as: U T 6{oj)lP{doj) - (fo < M-0{u)), ^ t=l 0{uj) e {0, 1} F-a.s., a; fulfilling (17), (18), (19)}. (23) Problem (22) is the well-known multistage stochastic (mixed-integer) linear program. Without integer requirements, the problem has been studied intensively, both from structural and from algorithmic view- points. The reader may wish to sample from [4, 5, 8, 10, 13, 15, 20, 25, 26, 27, 32] to obtain insights into these developments. With integer requirements, problem (22) is less well-understood. Existing results are reviewed in [31]. To the best of our knowledge, the multistage extension (21) has not been addressed in the literature so far. Some basic properties of (21), (22) regarding existence and structure of optimal solutions can be de- rived by following arguments that were employed for the expectation- based model (22) in [31]. Their mathematical foundations are laid out in [11, 12, 28]. The arguments can be outlined as follows: Problem (22) concerns the minimization of an abstract expectation over a function space, subject to measurability with respect to a filtered se- quence of a— algebras. Theorems 1 and 2 in [12] (whose assumptions can be verified for (22) using statements from [11, 28]) provide suffi- cient conditions for the solvability of such minimization problems and for the solutions to be obtainable recursively by dynamic programming. The stage- wise recursion rests on minimizing in the t— th stage the reg- ular conditional expectation (with respect to At) of the optimal value from stage t + 1. When arriving at the first-stage, a deterministic opti- mization problem in x\ remains (recall that A\ = {0,fi}). Its objective function Q^{xi) can be regarded the multistage counterpart to the func- tion Q]p(x) that we have studied in Section 3. Given that (22) is a well-defined and solvable optimization problem. Sections 3 and 4 provide several points of departure for future research. For instance, unveiling the structure of Q'p{xi) may be possible by analysing the interplay of conditional expectations and mixed-integer value functions. Regarding solution techniques, the extension of Algo- rithm 4.4 to the multistage situation may be fruitful. Indeed, it is well- known that the nonanticipativity in (19) is a linear constraint. With a discrete distribution of ^ this leads to a system of linear equations. Lagrangian relaxation of these constraints produces single-scenario sub- 186 problems, and the scheme of Algorithm 4.4 readily extends. However, compared with the two-stage situation, the relaxed constraints are more complicated such that primal heuristics are not that obvious, and the dimension I of the Lagrangian dual (14) may require approximative in- stead of exact solution of (14). Further algorithmic ideas for (22) may arise from Lagrangian relaxation of either (17) or (18). In [31] this is discussed for the expectation-based model (22). Acknowledgement. I am grateful to Morten Riis (University of Aarhus), Werner Rdmisch (Humboldt-University Berlin), and Stephan Tiedemann (Gerhard-Mercator University Duisburg) for stimulating dis- cussions and fruitful cooperation. References [1] Bank, B.; Mandel, R.: Parametric Integer Optimization, Akademie-Verlag, Berlin 1988. [2] Bereanu, B.: Minimum risk criterion in stochastic optimization, Economic Com- putation and Economic Cybernetics Studies and Research 2 (1981), 31-39. [3] Billingsley, P.: Convergence of Probability Measures, Wiley, New York, 1968. [4] Birge, J.R.: Stochastic programming computation and applications. INFORMS Journal on Computing 9 (1997), 111-133. [5] Birge, J.R.; Louveaux, F.: Introduction to Stochastic Programming, Springer, New York, 1997. [6] Blair, C.E.; Jeroslow, R.G.: The value function of a mixed integer program: I, Discrete Mathematics 19 (1977), 121-138. [7] Car0e, C.C.; Schultz, R.: Dual decomposition in stochastic integer programming. Operations Research Letters 24 (1999), 37-45. [8] Dempster, M.A.H.: On stochastic programming II: Dynamic problems under risk. Stochastics 25 (1988), 15-42. [9] Dupacova, J.: Stochastic programming with incomplete information: a survey of results on postoptimization and sensitivity analysis, Optimization 18 (1987), 507-532. [10] Dupacova, J.: Multistage stochastic programs: The state-of-the-art and selected bibliography, Kybernetika 31 (1995), 151-174. [11] Dynkin, E.B., Evstigneev, I.V.: Regular conditional expectation of correspon- dences, Theory of Probability and Applications 21 (1976), 325-338. [12] Evstigneev, L: Measurable selection and dynamic programming, Mathematics of Operations Research 1 (1976), 267-272. [13] Higle, J.L., Sen, S.: Duality in multistage stochastic programs, In: Prague Stochastics ’98 (M. Huskova, P. Lachout, J.A. Visek, Eds.), JCMF, Prague 1998, 233-236. Probability Objectives in Stochastic Programs with Recourse 187 [14] Kali, P.: On approximations and stability in stochastic programming, Paramet- ric Optimization and Related Topics (J. Guddat, H.Th. Jongen, B. Kummer, F. Nozicka, Eds.), Akademie Verlag, Berlin 1987, 387-407. [15] Kail, P.; Wallace, S.W.: Stochastic Programming, Wiley, Chichester, 1994. [16] Kibzun, A.I.; Kan, Y.S.: Stochastic Programming Problems with Probability and Quantile Functions, Wiley, Chichester, 1996. [17] Kiwiel, K. C.: Proximity control in bundle methods for convex nondifferentiable optimization. Mathematical Programming 46 (1990), 105-122. [18] Kiwiel, K. C.: User’s Guide for NOA 2. 0/3.0: A Fortran Package for Convex Nondifferentiable Optimization, Systems Research Institute, Polish Academy of Sciences, Warsaw, 1994. [19] Nemhauser, G.L.; Wolsey, L.A.: Integer and Combinatorial Optimization, Wiley, New York 1988. [20] Prekopa, A.: Stochastic Programming, Kluwer, Dordrecht, 1995. [21] Raik, E.: Qualitative research into the stochastic nonlinear programming prob- lems, Eesti NSV Teaduste Akademia Toimetised / Fuiisika, Matemaatica (News of the Estonian Academy of Sciences / Physics, Mathematics) 20 (1971), 8-14. In Russian. [22] Raik, E.: On the stochastic programming problem with the probability and quantile functionals, Eesti NSV Teaduste Akademia Toimetised / Fiiiisika, Matemaatica (News of the Estonian Academy of Sciences / Physics, Mathe- matics) 21 (1971), 142-148. In Russian. [23] Riis, M.; Schultz, R.: Applying the minimum risk criterion in stochastic recourse programs. Computational Optimization and Applications 24 (2003), 267-287. [24] Robinson, S.M.; Wets, R.J-B: Stability in two-stage stochastic programming, SIAM Journal on Control and Optimization 25 (1987), 1409-1416. [25] Rockafellar, R.T.: Duality and optimality in multistage stochastic programming. Annals of Operations Research 85 (1999), 1-19. [26] Rockafellar, R.T., Wets, R.J-B: Nonant icipativity and -martingales in stochastic optimization problems. Mathematical Programming Study 6 (1976), 170-187. [27] Rockafellar, R.T., Wets, R.J-B: The optimal recourse problem in discrete time: L^-multipliers for inequality constraints, SIAM Journal on Control and Opti- mization 16 (1978), 16-36. [28] Rockafellar, R.T., Wets, R.J-B: Variational Analysis, Springer- Verlag, Berlin, 1997. [29] Romisch, W.; Schultz, R.: Stability analysis for stochastic programs. Annals of Operations Research 30 (1991), 241-266. [30] Romisch, W.; Wakolbinger, A.: Obtaining convergence rates for approxima- tions in stochastic programming, Parametric Optimization and Related Topics (J. Guddat, H.Th. Jongen, B. Kummer, F. Nozicka, Eds.), Akademie Verlag, Berlin 1987, 327-343. [31] Romisch, W.; Schultz, R.: Multistage stochastic integer programs: an introduc- tion, Online Optimization of Large Scale Systems (M. Grotschel, S.O. Krumke, J. Rambau, Eds.), Springer- Verlag Berlin, 2001, 581-600. 188 [32] Ruszczyriski, A.: Decomposition methods in stochastic programming. Mathe- matical Programming 79 (1997), 333-353. [33] Schultz, R.: On structure and stability in stochastic programs with random technology matrix and complete integer recourse. Mathematical Programming 70 (1995), 73-89. [34] Schultz, R.: Rates of convergence in stochastic programs with complete integer recourse, SIAM Journal on Optimization 6 (1996), 1138-1152. [35] Schultz, R.: Some aspects of stability in stochastic programming. Annals of Operations Research 100 (2000), 55-84. [36] Tiedemann, S.: Probability Functionals and Risk Aversion in Stochastic Integer Programming, Diploma Thesis, Department of Mathematics, Gerhard-Mercator University Duisburg, 2001. PARAMETRIC SENSITIVITY ANALYSIS: A CASE STUDY IN OPTIMAL CONTROL OF FLIGHT DYNAMICS Christof Biiskens Lehrstuhl fiir Ingenieurmathematik, Universitdt Bayreuth Universitdtsstr. 30, D- 9 5440 Bayreuth, Germany christof.bueskens@uni-bayreuth.de Kurt Chudej Lehrstuhl fiir Ingenieurmathematik, Universitdt Bayreuth Universitdtsstr. 30, D-95440 Bayreuth, Germany kurt.chudej@uni-bayreuth.de Abstract Realistic optimal control problems from flight mechanics axe currently solved by sophisticated direct or indirect methods in a fast and reliable way. Often one is not only interested in the optimal solution of one control problem, but is also strongly interested in the sensitivity of the optimal solution due to perturbations in certain parameters (constants or model functions) of the process. In the past this problem wcis solved by time-consuming parameter studies: A large number of almost similar optimal control problems were solved numerically. Sensitivity deriva- tives were approximated by finite differences. Recently a new approach, called parametric sensitivity analysis, was adapted to the direct solution of optimal control processes [3]. It uses the information gathered in the optimal solution of the unperturbed (nominal) optimal control problem to compute sensitivity differentials of all problem functions with respect to these parameters. This new approach is described in detail for an example from trajectory optimization. Keywords: parametric sensitivity analysis, optimal control, direct methods, trajec- tory optimization. Introduction Realistically modelled optimal control problems can be solved effi- ciently and reliably by sophisticated direct and indirect methods (see 189 190 e.g. the survey articles [1], [10]). Trajectory optimization problems for aircrafts and space vehicles usually pose hard challenges for the direct and indirect solution algorithms. A couple of direct algorithms have proved their ability to solve accurately and reliably trajectory optimiza- tion problems in the last decade, such as e.g. SOCS (Betts [2]), GESOP (Jansch, Well, Schnepper [9]), DIRCOL (von Stryk [12]) and NUDOC- CCS (Biiskens [3]). Trajectory optimization problems use in general complicated models of the surrounding atmospheric effects, the perfor- mance and consumption of the engines and the ability to maneuver. Usually optimal solutions are computed at first for a nominal data set of the model. Later huge parameter studies are done for perturbed model data. This means that the whole optimization process is started again for the huge number of perturbed models. We present a new approach of Biiskens [3]: Exploiting already com- puted information during the solution of the nominal optimal control problem to derive sensitivity information. This substitutes the addi- tional solution of perturbed optimal control problems. We explain the new approach of parametric sensitivity analysis in detail for an example from flight mechanics. The trajectory optimiza- tion problem is concerned with minimizing the amount of fuel used per travelled range over ground with periodic boundary conditions. It is interesting that by periodic trajectories and controls savings in fuel con- sumption can be achieved in comparison to the steady-state solution. In order to normalize the changing effects of the atmosphere due to the weather, one uses data of a reference atmosphere in the computational model. Unfortunately there exist a couple of reference atmospheres. Additionally one is also interested in realistic changes of the air density onto the computed optimal solution. We therefore provide not only the nominal solution but also the sen- sitivity with respect to the air density as an example. Note that no parameter studies are needed. Information gathered during the compu- tation of the nominal solution is used. Applications to further perturbation parameters in the model are straight forward. 1. A Trajectory Optimiziation Problem Aircraft usually use steady-state cruise to cover long distances. It is interesting, that these steady-state trajectories are non-optimal with respect to minimizing fuel [11]. The following optimal control problem from [8], enlarged by a per- turbation parameter p, describes the problem of minimizing fuel per Parametric Sensitivity Analysis: A Case Study in Optimal Control 191 travelled range over ground for a realistically modelled aircraft flying in a vertical plane. State variables are velocity u, flight path angle 7 , altitude h and weight W. The range x is used as the independent variable. The lift coeflicient Cl and the throttle setting 6 are the control variables. For a given value of the perturbation parameter p (nominal value is here po = 1) And control functions Cl{x;p) and S{x;p) and the flnal range Xf{p) such that the cost functional I=[Wo- W{xf)]/xf is minimized and the following equations of motion, control constraints and boundary conditions are fulfllled. dv dx dj dx dx dW dx v(0) V cos 7 L T{h,M) S-D{h,M,CL) — sm7 L(h,M,CL) Wq cos 7 - 1 tan 7 -T{h,M) d UCOS7 0 ^ Cl < C^L,max 1 v{xf), 7(0) = j{xf), h{0) = h{xf), W{0) = Wo ( 1 ) ( 2 ) (3) Model functions are the Mach number M, speed of sound a, thrust T, consumption c, lift L, drag jD, air density p. S denotes the constant reference area, g denotes the gravitational constant. M{v,h) = v/a{h) a{h) r ~3 — = aiA L V i=0 T{h,M) c{h, M) T = ci(h) + C 2 (h)M + C 3 (h)M^ + C 4 (h)M^ = di(h) + d2(h)M + d3(h)M^ + d4(h)M^ D{v,h,CL) = p(h) S Cl/2 = p{h) S [Cdo(M) + ACd{M, Cdo{M) ACd{M,Cl) = a\ arctan[a2 (M — 03)] + 04 = h(M)Cl + b2{M)Cl + b3{M)Cl + b4{M)Cl p{h) = p Po exp [/S7 + + / 3 s exp ^ ^ 192 The coefficients of the polynomials bi{M), Ci{h), di{h), and the constants Po, ft, cii, ft ttj, Wo, ^min, Cl, max can be found in the [8]. x/x{ Figure 1. Nominal optimal state v(a;;po) Figure 3. Nominal optimal state h(x]po) x/x{ Figure 2. Nominal optimal state 7(x;po) Figure 4- Nominal optimal state W(x;po) As an additional perturbation parameter we use p. A solution by an indirect multiple shooting algorithm is presented in [8] for the nominal value of po = 1- In these times, before sophisticated direct methods were developed, elaborated homotopies were required for the indirect Parametric Sensitivity Analysis: A Case Study in Optimal Control 193 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 x/xf Figure 6. Nominal optimal control 5{x\po) solution. Today these initial estimates for the indirect method can be provided easily by direct methods. The figures 1-4 show the optimal states and the figures 5-6 show the optimal controls for the nominal value of p — po == 1 computed by the direct method NUDOCCCS (Biiskens [3]). Moreover the direct method NUDOCCCS can compute in a post- processing step also the sensitivities of the optimal states f^(^;p), • • • , ^(rr;p) and the optimal controls ^"(rr;p), ^{x;p) with respect to perturbation parameters p. 2. Parametric Sensitivity Analysis The general mathematical approach for a parametric sensitivity anal- ysis of perturbed optimal control problems is based on NLP methods: The following autonomous perturbed control problem of Mayer-form will be referred to as problem OCP(p): For a given perturbation parameter p G P find control functions u{x]p) and the final time Xf(p) such that the cost functional J = 9{y{x{),p) (4) is minimized subject to the following constraints y'(x) = f{y{x),u{x),p) , X € [0,a;f], V'(y(0),y(a;f),p) = 0 C{y{x),u{x),p) <0 , rE G [OjXf]. x/xi Figure 5. Nominal optimal control Cl{x]Pq) ( 5 ) 194 Herein y{x) G IR” denotes the state of a system and u{x) G IR'” the control with respect to an independent variable x, which is often the time. In the previously introduced trajectory optimization problem the in- dependent variable x denotes the range, the state is given by y := (n, 7 , /i, and the control by u := (Cx,, The functions y.W^xP : 1R"+'” x P ^ IR”, ^ : IR^'^ x P ^ IR^, and C : 1R”+'” x P — )■ IR*^ are assumed to be sufficiently smooth on appropriate open sets. The final time X{ is either fixed or free. Note that the formulation of mixed control-state constraints C(y{x),u{x),p) < 0 in (5) includes pure control constraints C{u{x),p) < 0 as well as pure state constraints C{y{x),p) < 0. It is well known, that problems of form OCP(p) can be solved efficiently by approximating the control functions u* ^ u{xi) for given mesh points iCj G [0, Xf], i = 1, . . . , N’ and solving the state variables by standard integration methods. This leads to approximations y{xi;z,p) w y{xi), z := {u^ , . . . ,u^) of the state at the mesh points Xi. For a more detailed discussion please refer to [3]- [ 6 ]. Therefore the optimal control problem OCP(p) is replaced by the finite dimensional perturbed nonlinear optimization problem NLP (p) For a given p € P mm g{y{xN;z,p),p) s.t. i){y{xN\z,p),p) = 0, C{y{xi'z,p),u\p)<Q, i = l,...,iV. ( 6 ) Several reliable optimization codes have been developed for solving NLP problems (6), like e.g. SQP methods. This idea is implemented e.g. in the direct methods SOCS, GESOP, DIRCOL and NUDOCCCS. An additional, and to our knowledge unique, feature of NUDOCCCS (Biiskens [3]) is the ability to compute accurately the sensitivity differ- entials ^{x\pq), ^{x\pq) of the approximations y(a;;po + Ap) u{x\pq -h Ap) y(a;;po) -F ^{x]po) ■ Ap, u{x-po) + ^{x-,Po) ■ Ap. ( 7 ) This is done by the following idea: Let zq denote the unperturbed solution of NLP(po) for a nominal parameter p = po and let denote the collection of active constraints in (6). L{z,y,p) := g{y{xN;z,p),p) + p7h‘^{z,p) is the Lagrangian function with the associated Lagrange multiplier fi. Then the following results hold [7]: Parametric Sensitivity Analysis: A Case Study in Optimal Control 195 Solution Differentiability for NLP-problems: Suppose that the optimal solution (2ro,/io) for the nominal problem NLP(po) satisfies a maximal rank condition for second order sufficient optimality conditions and strict complementarity of the multiplier fi. Then the un- perturbed solution (zQ^fio) can be embedded into a -family of perturbed solutions for NLP(p) with z{po) = zq, l^{Po) — /^o- The sensitivity differentials of the optimal solutions are given by the formula ( |(Po) ^ (KV Y' ( \ V tiPo) ) \ K 0 ) \ hi ) evaluated at the optimal solution. This formula provides good approxi- mations for the sensitivity of the perturbed optimal controls at the mesh points, i.e. for the quantities ^{xi]po)^ i = Then the state sensitivities obtained by differentiating the control-state relation in (6) y{xi) = y{xi^z.,p) with respect to the parameter p\ ^{xf,po) ^ ^{Xf, zo,po)^{po) + ^{xi]zo,po). ( 9 ) The sensitivity differentials of the adjoint variables or objective func- tional can be calculated respectively. xjxi Figure 7. Sensitivity dv/dp{x]po) Figure 8. Sensitivity d'y/dp{x]po) We return back to the example. In the first step the optimal nomi- nal solution is calculated by the code NUDOCCCS of Biiskens [3], see figures 1-6. In the second step the sensitivity differentials of the model functions (states, controls, adjoint variables, cost functional and further 196 interesting model functions) are calculated from equations (8, 9), see figures 7-12. These figures provide valuable information for the engineers. Addi- tional perturbation parameters can be added to the model. Basically only an additional matrix vector multiplication is needed in order to compute the sensitivity differentials of the states and controls for each component of the perturbation parameter. Figure 11. Sensitivity dCi /dp (a;; po) Figure 12. Sensitivity dS/dp(x]po) Parametric Sensitivity Analysis: A Case Study in Optimal Control 197 References [1] Betts, J. T. (1998) Survey of Numerical Methods for Trajectory Optimization. Journal of Guidance, Control, and Dynamics, Vol. 21, pp. 193-207. [2] Betts, J. T. (2001) Practical Methods for Optimal Control Using Nonlinecir Pro- gramming. SIAM, Philadelphia. [3] Biiskens, C. (1998) Optimierungsmethoden und Sensitivitatsanalyse fur optimale Steuerprozesse mit Steuer- und Zustands-Beschrankungen. Dissertation, Univer- sitat Munster. [4] Biiskens, C., Maurer, H. (2000) SQP-Methods for Solving Optimal Control Prob- lems with Control and State Constraints: Adjoint Variables, Sensitivity Analysis and Real-Time Control. Journal of Computational and Applied Mathematics, Vol. 120, pp. 85-108. [5] Biiskens, C., Maurer, H. (2001) Sensitivity Analysis and Real-Time Optimization of Parametric Nonlinear Programming Problems. - In: Grotschel, M., Krumke, S.O., Rambau, J. (Eds.): Online Optimization of Large Scale Systems: State of the Art. Springer Verlag, Berlin, pp. 3-16. [6] Biiskens, C., Maurer, H. (2001) Sensitivity Analysis and Real-Time Control of Parametric Optimal Control Problems Using Nonlinear Programming Methods. - In: Grotschel, M., Krumke, S.O., Rambau, J. (Eds.): Online Optimization of Large Scale Systems: State of the Art. Springer Verlag, Berlin, pp. 57-68. [7] Fiacco, A.V. (1983) Introduction to Sensitivity and Stability Analysis in Nonlin- ear Programming. Academic Press, New York. [8] Grimm, W., Well, K.H., Oberle, H.J. (1986) Periodic Control for Minimum- Fuel Aircraft Trajectories. Journal of Guidance, Vol. 9, 169-174. [9] Jansch, C., Well, K.H., Schnepper, K. (1994) GESOP - Eine Software Umgebung zur Simulation und Optimierung. In: Proc. des SFB 255 Workshops Optimal- steuerungsprobleme von Hyperschall-Flugsystemen, Ernst-Moritz- Arndt Univer- sitat Greifswald, pp. 15-23. [10] Pesch, H.J. (1994) A Practical Guide to the Solution of Real-Life Optimal Con- trol Problems. Control and Cybernetics, 23, pp. 7-60. [11] Speyer, J.L. (1976) Nonoptimality of the Steady-State Cruise for Aircraft. AIAA Journal, Vol. 14, pp. 1604-1610. [12] von Stryk, O. (1995) Numerische Losung optimaler Steuerungsproblems: Diskretisierung, Parameteroptimierung und Berechnung der adjungierten Vari- ablen. VDI- Verlag, Reihe 8, Nr. 441. SOLVING QUADRATIC MULTI- COMMODITY PROBLEMS THROUGH AN INTERIOR-POINT ALGORITHM Jordi Castro Department of Statistics and Operations Research Universitat Politecnica de Catalunya Pau Gargallo 5, 08028 Barcelona, Spain * jcastro@eio.upc.es Abstract Standard interior-point algorithms usually show a poor performance when applied to multicommodity network flow problems. A recent specialized interior-point algorithm for linear multicommodity network flows overcame this drawback, and wcis able to efficiently solve large and difficult instances. In this work we perform a computational evaluation of an extension of that specialized algorithm for multicommodity prob- lems with convex and separable quadratic objective functions. As in the linear case, the specialized method for convex separable quadratic prob- lems is based on the solution of the positive definite system that appears at each interior-point iteration through a scheme that combines direct (Cholesky) and iterative (preconditioned conjugate gradient) solvers. The preconditioner considered for linear problems, which was instru- mental in the performance of the method, has shown to be even more efficient for quadratic problems. The specialized interior-point algo- rithm is compared with the general barrier solver of CPLEX 6.5, and with the specialized codes PPRN and ACCPM, using a set of convex separable quadratic multicommodity instances of up to 500000 variables and 180000 constraints. The specialized interior-point method was, in average, about 10 times and two orders of magnitude faster than the CPLEX 6.5 barrier solver and the other two codes, respectively. Keyw^ords: Interior-point methods, network optimization, multicommodity flows, quadratic programming, large-scale optimization. *Work supported by grant CICYT TAP99-1075-C02-02. 199 200 1. Introduction Multicommodity flows are widely used as a modeling tool in many flelds as, e.g., in telecommunications and transportation problems. The multicommodity network flow problem is a generalization of the mini- mum cost network flow one where k different items — the commodities — have to be routed from a set of supply nodes to a set of demand nodes using the same underlying network. This kind of models are usually very large and difficult linear programming problems, and there is a wide literature about specialized approaches for their efficient solution. However most of them only deal with the linear objective function case. In this work we consider a specialized interior-point algorithm for multi- commodity network flow problems with convex and separable quadratic objective functions. The algorithm has been able to solve large and difficult quadratic multicommodity problems in a fraction of the time required by alternative solvers. In the last years there has been a significant amount of research in the field of multicommodity flows, mainly for linear problems. The new solu- tion strategies can be classified into four main categories: simplex-based methods [6, 15], decomposition methods [10, 12], approximation meth- ods [13], and interior-point methods [4, 12]. Some of these algorithms were compared in [7] for linear problems. The available literature for nonlinear multicommodity flows is not so extensive. For instance, of the above approaches, only the codes of [6] and [12] (named PPRN — nonlinear primal partitioning — and ACCPM — analytic center cutting plane method — , respectively) were extended to nonlinear (possibly non-quadratic) objective functions. In this work we compared the specialized interior-point algorithm with those two codes using a set of large-scale quadratic multicommodity problems. The spe- cialized interior-point algorithm turned out to be the most efficient strat- egy for all the instances. A description and empirical evaluation of addi- tional nonlinear multicommodity algorithms can be found in the survey [14]. The specialized-interior point method presented here is an extension for convex and separable quadratic objective functions of the algorithm introduced in [4] for linear multicommodity flows. The solution strategy suggested for linear problems (i.e., solving the positive definite system at each interior-point iteration through a scheme that combines direct and iterative solvers) can also be applied to convex and separable quadratic multicommodity problems. Moreover, as it will be shown in the com- putational results, this solution strategy turned out to be even more efficient for quadratic than for linear problems. Solving Quadratic Multicommodity Problems through an IP Algorithm 201 Up to now most applications of multicommodity flow models dealt with linear objective functions. Quadratic multicommodity problems are not usually recognized as a modeling tool, mainly due to the lack of an efficient solver for them. The specialized interior-point method can help to fill this void. The efficient solution of large and difficult quadratic multicommodity problems would open new modeling perspectives (e.g., they could be used in network design algorithms [9]). The structure of the document is as follows. In Section 2 we formulate the quadratic multicommodity flow problem. In Section 3 we sketch the specialized interior-point algorithm for multicommodity flow problems, and show that it can also be applied to the quadratic case. Finally in Section 4 we perform an empirical evaluation of the algorithm using a set of large-scale quadratic multicommodity flow instances, and three alternative solvers (i.e., CPLEX 6.5, PPRN and ACCPM). 2. The quadratic multicommodity flow problem Given a network of m nodes, n arcs and k commodities, the quadratic multicommodity network flow problem can be formulated as min k ^{{cYx^ + {xY Q X^) subject to ■ iV 0 . 0 N . • O O • o o X^ x^ 61 1 62 0 0 . J 1 . . N 0 . 1 1 _ . . 6^ u 0 < <u^ i = 1 .. .k 0 < < u. (1) Vectors € ]R”,i = 1 . . . fc, are the flows for each commodity, while G IR^ are the slacks of the mutual capacity constraints. N G is the node-arc incidence matrix of the underlying network, and 1 denotes the n X n identity matrix, G IR^ are the arc linear costs for each commodity and for the slacks, G IR^ and u G IR^ are respectively the individual capacities for each commodity and the mutual capacity for all the commodities, G IR"^ are the supply/demand vectors at the nodes of the network for each commodity. Finally G IR^^^ are the arc quadratic costs for each commodity and for the slacks. We will restrict to the case where is a positive semidefinite diagonal matrix, thus 202 having a convex and separable quadratic objective function. Note that (1) is a quadratic problem with rh — km+n constraints and n = {k + l)n variables. Most of the applications of multicommodity flows in the literature only involve linear costs. However, quadratic costs can be useful in the following situations: ■ Adding a quadratic penalty term to the occupation of a line in a transmission/transportation network. In this case we would set Q'^ — — 1 .. .k. This would penalize saturation of lines, guar- anteeing a reserve capacity to redistribute the current pattern of flows when line failures occur. ■ Replacing a convex and separable nonlinear function by its quadratic approximation. ■ Finding the closest pattern of flows x to the currently used when changes in capacities/demands are performed. In this case the quadratic term would be {x — — x) ■ Solution of the subproblems in an augmented Lagrangian relax- ation scheme for the network design problem [9, 11] 3. The specialized interior-point algorithm The multicommodity problem (1) is a quadratic program that can be written in standard form as lin I mm < c" rr + -x^ Qx Ax = X + s = x^s >0 ( 2 ) where x,s,u€ IR", Q € R"''" and b £ The dual of (2) is max f 'T' 1 T' 'T' nn j I ^ ~ 2 ^ ~ ^ y ~ + z — w = z^w >0 > , (3) where y G IR’^ and z^w E IR^. For problem (1), matrix Q is made of A; + 1 diagonal blocks; blocks = 1 ... A:, are related to the flows for each commodity, while are the quadratic costs of the mutual capacity slacks. The solution of (2) and (3) by an interior-point algorithm is obtained through the following system of nonlinear equations (see [18] for details) Txz = jJie — XZe = 0 Tsw = - SWe = 0 = b — Ax = 0 Tc = c — {A^y — Qx + z — w) = 0 (x, s^z^w) > 0 , ( 4 ) Solving Quadratic Multicommodity Problems through an IP Algorithm 203 where e is a vector of I’s of appropriate dimension, and matrices X, Z, 5, W are diagonal matrices made from vectors z, s, w. The set of unique solutions of (4) for each /i value is known as the central path^ and when 0 these solutions superlinearly converge to those of (2) and (3) [18]. System (4) is usually solved by a damped version of Newton’s method, reducing the /i parameter at each iteration. This procedure is known as the path-following algorithm [18]. Figure 1 shows the main steps of the path-following algorithm for quadratic problems. Figure 1. Path-following algorithm for quadratic problems. Algorithm Path-following{A^ Q, 6, c, u): 1 Initialize a; > 0, 5 > 0, y, z > 0, ip > 0; 2 while (a;, 5, y, 2 :, w) is not solution do 3 0 = {X-^Z + S~^W + g)-i; 4 r = S n syj n c X 5 {A@A^)Ay = Vb A&r; 6 Ax — Q{A^Ay — r); 7 Aw — S~^{rsw + W Ax)\ 8 Az — Vc + Aw + QAx — A^ Ay\ 9 Compute ap G (0, 1], ap G (0, 1]; 10 X ^ X + apAx] 11 (y, 2 :, w) G- (y, z, w) + ap{Ay, Az, Aw); 12 end_while End-algorithm The specialized interior-point algorithm introduced in [4] for linear multicommodity problems exploited the constraints matrix structure of the problem for solving {A&A'^)Ay == b (line 5 of Figure 1), which is by far the most computationally expensive step. Considering the structure of A in (1) and accordingly partitioning the diagonal matrix 0 defined in line 3 of Figure 1, we obtain B c ' ■ Ne^N'^ . 0 iV0i D 0 . . e^N'^ Eio©' J where 0^ = {{X^)~^Z^ + + QT\i = 0, 1 ... A;. Note that the only difference between the linear and quadratic case is term of 0\ 204 Moreover, as we are assuming that is a diagonal matrix, 0* can be easily computed. Using (5), and appropriately partitioning Ay and &, we can write {A@A^)Ay = b as B c ' ' Ayi ■ bi D . A ?/2 . h By block multiplication, we can reduce (6) to {D-C^B-^C)Ay2 = (b2-C^B-%) (7) BAyi = (bi - CAy 2 ). (8) System (8) is solved by performing a Cholesky factorization of each diagonal block N&^N^i — 1 . . . of B. System with matrix H = D — the Schur complement of (5), is solved by a precondi- tioned conjugate gradient (PCG) method. A good preconditioner is instrumental for the performance of the method. In [4] it was proved that if m D is positive semidefinite, an ■ D + C^BC is positive semidefinite, then the inverse of the Schur complement can be computed as H-^ = (9) The preconditioner is thus obtained by truncating the infinite power se- ries (9) at some term h (in practice h = 0 ot h = 1] dll the computational results in this work have been obtained with h = 0). Since k D, = Di + ^{QT\ i=0 Dq and Di denoting the D matrix for a quadratic and linear problem respectively, it is clear that for quadratic multicommodity problems the above two conditions are also guaranteed, and then the same precondi- tioner can also be applied. Moreover, since we are assuming diagonal matrices, for /i = 0 the preconditioner is equal to H~^ = which is also diagonal, as for linear multicommodity problems. This is instru- mental in the overall performance of the algorithm. More details about this solution strategy can be found in [4]. Solving Quadratic Multicommodity Problems through an IP Algorithm 205 The effectiveness of the preconditioner is governed by the spectral radius of D~^{C^BC))^ which is always in [0, 1). The farthest from 1, the better the preconditioner. According to the computational results obtained, this value seems to be less for quadratic problems than for the equivalent linear problems without the quadratic term, since fewer conjugate gradient iterations are performed for solving (7). Moreover, the number of interior-point iterations also decreases in some instances. This can be observed in Figures 2, 3 and 4. Figures 2 and 3 show the overall number of PCG and IP iterations for the linear and quadratic versions of the Mnetgen problems in Table 1 of Section 4. Both versions only differ in the Q matrix. Clearly, for the quadratic problems fewer IP and PCG iterations are performed. The number of PCG iterations per IP iteration has also been observed to decrease for quadratic problems. For instance. Figure 4 shows the number of PCG iterations per IP iteration for the linear and quadratic versions of problem PDS20 in Table 2 of Section 4. We chose this instance because it can be considered a good representative of the general behavior observed and, in addition, the number of IP iterations is similar for the linear and quadratic problems. A better understanding of the relationship between the spectral radius of D~^{C'^BC)) for the linear and quadratic problems is part of the further work to be done. Figure 2. Overall number of PCG iterations for the quadratic and linear Mnetgen instances. Figure 3. Overall number of IP it- erations for the quadratic and linear Mnetgen instances. 4. Computational results The specialized algorithm of the previous section was tested using two sets of quadratic multicommodity instances. As far as we know, there is no standard set of quadratic multicommodity problems. Thus 206 Figure 4- Number of PCG iterations per interior-point iteration, for the quadratic and linear PDS20 instance. IP iteration we developed a meta-generator that adds the quadratic term k n 1=1 j=l to the objective function of a linear multicommodity problem. Coeffi- cients Qj are randomly obtained from an uniform distribution C/[0, (7], where E k i i=l Z^j=l kn in an attempt to guarantee that linear and quadratic terms are of the same order. We applied our meta-generator to two sets of linear multicommodity instances obtained with the well-known Mnetgen [1] and PDS [3] gener- ators. Tables 1 and 2 show the dimensions of the instances. Columns “m”, “n”, and “/j” give the number of nodes, arcs and commodities of the network. Columns ‘n” and “m” give the number of variables and constraints of the quadratic problem. The Mnetgen and PDS generators can be downloaded from http : //www . di . unipi . it /di/groups/opt imize/Data/MMCF . html. We solved both sets with an implementation of the specialized interior- point algorithm, referred to as IPM [4], and with CPLEX 6.5 [8], a state- of-the-art interior-point code for quadratic problems. The IPM code, as well as a parallel version [5], can be downloaded for research purposes from Solving Quadratic Multicommodity Problems through an IP Algorithm 207 Table 1. Dimensions of the quadratic Mnetgen instances. Instance m n k h ifi Me4-4 64 524 4 2620 780 Me4-8 64 532 8 4788 1044 M64-16 64 497 16 8449 1521 M64-32 64 509 32 16797 2557 M64-64 64 511 64 33215 4607 Mi28-4 128 997 4 4985 1509 Mi28-8 128 1089 8 9801 2113 Mi28-16 128 1114 16 18938 3162 Mi28-32 128 1141 32 37653 5237 Mi28-64 128 1171 64 76115 9363 Mi28-128 128 1204 128 155316 17588 M256-4 256 2023 4 10115 3047 M256-8 256 2165 8 19485 4213 M256-I6 256 2308 16 39236 6404 M256-32 256 2314 32 76362 10506 M256-64 256 2320 64 150800 18704 M256-I28 256 2358 128 304182 35126 M256-256 256 2204 256 566428 67740 Table 2. Dimensions of the quadratic PDS instances. Instance m n k h rh PDSl 126 372 11 4464 1758 PDSIO 1399 4792 11 57504 20181 PDS20 2857 10858 11 130296 42285 PDS30 4223 16148 11 193776 62601 PDS40 5652 22059 11 264708 84231 PDS50 7031 27668 11 332016 105009 PDS60 8423 33388 11 400656 126041 PDS70 9750 38396 11 460752 145646 PDS80 10989 42472 11 509664 163351 PDS90 12186 46161 11 553932 180207 208 Table 3. Results for the quadratic Mnetgen problems Instance CPLEX 6.5 IPM fcPLEX flPM CPLEX CPU n.it. CPU n.it Me4-4 0.7 12 0.3 18 -2.0e-6 M64-8 3.1 12 0.8 20 5.6e-7 M64-16 10.7 15 1.6 21 -1.6e-6 M64-32 20.8 16 4.3 25 1.9e-6 M64-64 46.8 14 10.7 31 -l.le-6 Mi28-4 2.8 11 0.8 17 1.2e-7 Mi28-8 12.6 11 2.1 21 2.5e-6 Mi28-16 80.5 13 5.9 28 6.9e-6 Mi28-32 153.6 14 15.7 35 2.8e-6 Mi28-64 305.5 14 35.3 36 -1.4e-6 Mi28-128 741.9 15 98.8 48 -5.7e-7 M256-4 13.1 13 2.7 20 -7.9e-6 M256-8 73.8 14 6.7 22 2.4e-5 M256-I6 634.1 15 22.5 34 2.3e-5 M256-32 1105.2 16 49.9 36 2.4e-6 M256-64 2102.2 16 140.0 53 4.9e-7 M256-I28 4507.3 17 327.6 62 5.0e-6 M256-256 11761.3 24 835.3 85 7.0e-6 Table 4- Results for the quadratic PDS problems Instance CPLEX 6.5 IPM ^CPLEX Apm CPU n.it. CPU n.it CPLEX PDSl 1.6 23 1.3 29 -2.7e-7 PDSIO 234.8 43 78.6 62 -6.6e-7 PDS20 1425.6 55 271.0 69 1.9e-6 PDS30 5309.8 76 938.3 96 -6.0e-6 PDS40 10712.3 79 1965.2 105 -4.1e-6 PDS50 14049.7 80 3163.3 114 -4.1e-7 PDS60 17133.4 71 3644.2 95 3.6e-6 PDS70 25158.3 74 5548.7 101 -1.9e-7 PDS80 26232.1 74 7029.9 100 -1.3e-6 PDS90 32412.9 77 9786.7 109 -1.2e-6 Solving Quadratic Multicommodity Problems through an IP Algorithm 209 http : //www-eio .upc . es/'" j castro. For each instance, Tables 3 and 4 give the CPU time in seconds required by IPM and CPLEX 6.5 (columns “CPU”), the number of interior-point iterations performed by IPM and CPLEX 6.5 (columns “n.it.”), and the relative error of the solution obtained CPLEX with IPM (assuming CPLEX 6.5 provides the exact optimum). Execu- tions were carried out on a Sun Ultra2 2200 workstation with 200MHz, 1Gb of main memory, and ?^45 Linpack Mflops. Figures 5-8 summarize the information of Tables 3 and 4. Figures 5 and 6 show respectively the ratio between the CPU times of CPLEX 6.5 and IPM, and the number of interior-point iterations performed by CPLEX 6.5 and IPM, with respect to the dimension of the problem (i.e., number of variables), for the Mnetgen instances. The same information is shown in Figures 7 and 8 for the PDS problems. Figure 5. Ratio of the execution times of CPLEX 6.5 and IPM for the quadratic Mnetgen problems. Figure 6. Number of IP iterations performed by CPLEX 6.5 and IPM for the quadratic Mnetgen problems. Figure 7. Ratio of the execution times of CPLEX 6.5 and IPM for the quadratic PDS problems. Figure 8. Number of IP iterations performed by CPLEX 6.5 and IPM for the quadratic PDS problems. 210 From Figures 5 and 7, IPM was in the all cases more efficient than CPLEX 6.5 (the ratio time was always greater than 1.0). For some Mnet- gen and PDS instances IPM was about 20 and 5 times faster, respec- tively. It is important to note that IPM makes use of standard Cholesky routines [16], whereas CPLEX 6.5 includes a highly tuned and optimized factorization code [2]. Therefore, in principle, the performance of IPM could even be improved. Looking at Figures 6 and 8 it can be seen that IPM performed many more interior-point iterations than CPLEX 6.5. This is because, unlike CPLEX 6.5, the current version of IPM does not implement Mehrotra’s predictor-corrector heuristic. In [4] it was shown that Mehrotra’s heuristic was not appropriate for linear multi- commodity problems. However, for quadratic problems, and because of the good behavior of the preconditioner, it could be an efficient option. Adding Mehrotra’s strategy to IPM is part of the additional tasks to be performed. Finally, we compared IPM and CPLEX 6.5 with PPRN [6] and with an implementation of the ACCPM [12] that we developed using the standard ACCPM library distribution [17]. For this purpose we chose some of the smallest Mnetgen and PDS instances, whose dimensions are shown in Tables 5 and 6 (columns m, n, fc, rh and n, with the same meaning as before). These Tables also give the execution time in seconds (columns “CPU”) for each solver. Clearly, CPLEX 6.5 and IPM outperformed both PPRN and ACCPM. Moreover, PPRN and ACCPM seemed not to be competitive approaches for quadratic multicommodity flows. (On the other hand, unlike CPLEX 6.5 and IPM, they can deal with nonlinear objective functions.) Table 5. Dimensions and results for the small quadratic Mnetgen problems. Instance m n k fi rh CPLEX CPU IPM PPRN ACCPM Me4-4 64 524 4 2620 780 0.7 0.3 6.0 158.0 M64-8 64 532 8 4788 1044 3.1 0.8 38.0 2116.9 M64-16 64 497 16 8449 1521 10.7 1.6 184.6 5683.4 M64-32 64 509 32 16797 2557 20.8 4.3 failed 15753.4 M64-64 64 511 64 33215 4607 46.8 10.7 12710.1 34027.3 5. Conclusions and future tasks From the computational experience reported, it can be stated that the specialized interior-point algorithm is a promising approach for separable quadratic multicommodity problems. Among the future tasks to be Solving Quadratic Multicommodity Problems through an IP Algorithm 211 Table 6. Dimensions and results for the small quadratic PDS problems. Instance m n k h rh CPLEX CPU IPM PPRN ACCPM PDSl 126 372 11 4464 1758 1.6 1.3 75.5 failed PDS2 252 746 11 8952 3518 5.2 7.9 293.3 failed PDS3 390 1218 11 14616 5508 10.2 10.5 903.4 failed PDS4 541 1790 11 21480 7741 22.4 33.9 1702.8 failed PDS5 686 2325 11 27900 9871 53.2 44.7 2631.3 failed performed we find a deep study of the behavior of the spectral radius of BC)^ the addition of Mehrotra’s predictor-corrector method, and using the algorithm in a network design framework. References [1] A. Ali and J.L. Kennington. (1977). Mnetgen program documentation, Technical Report 77003, Dept, of Ind. Eng. and Operations Research, Southern Methodist University, Dallas. [2] R.E. Bixby, M. Fenelon, Z. Gu, E. Rothberg and R. Wunderling. (2000). MIP: Theory and practice — Closing the gap, in: System Modelling and Optimization. Methods, Theory and Applications, eds. M.J.D. Powell and S. Scholtes, Kluwer, 19-49. [3] W.J. Carolan, J.E. Hill, J.L. Kennington, S. Niemi and S.J. Wichmann. (1990). An empirical evaluation of the KORBX algorithms for military airlift applica- tions, Operations Research, 38, 240-248. [4] J. Castro. (2000). A specialized interior-point algorithm for multicommodity network flows, SIAM J. on Optimization, 10(3), 852-877. [5] J. Castro. (2000). Computational experience with a parallel implementation of an interior-point algorithm for multicommodity flows, in: System Modelling and Optimization. Methods, Theory and Applications, eds. M.J.D. Powell and S. Scholtes, Kluwer, 75-95. [6] J. Castro and N. Nabona. (1996). An implementation of linear and nonlinear multicommodity network flows, European J. of Operational Research, 92, 37-53. [7] P. Chardaire and A. Lisser. (1999). Simplex and interior point specialized al- gorithms for solving non-oriented multicommodity flow problems. Operations Research (to appear). [8] ILOG CPLEX. (1999). ILOG CPLEX 6.5 Reference Manual Library, ILOG. [9] A. Prangioni. (2000). Personal communication. [10] A. Prangioni and G. Gallo. (1999). A bundle type dual-ascent approach to linear multicommodity min cost flow problems, INFORMS J. on Comp., 11(4), 370- 393. [11] B. Gendron, T.G. Crainic, A. Prangioni. (1999). Multicommodity capacitated network design, in Telecommunications Network Planning, B. Sanso and P. So- riano (Eds.), Kluwer Academics Publishers, 1-19. 212 [12] J.-L. Goffin, J. Gondzio, R. Sarkissian and J.-P. Vial. (1996). Solving nonlinear multicommodity flow problems by the analytic center cutting plane method, Math. Programming^ 76, 131-154. [13] Andrew V. Goldberg, Jeffrey D. Oldham, Serge Plotkin and Cliff Stein. (1998). An implementation of a combinatorial approximation algorithm for minimum- cost multicommodity flow, in: Lecture Notes in Computer Sciences. Proceedings of the 6th International Integer Programming and Combinatorial Optimization Conference^ eds. R.E. Bixby, E.A. Boyd and R.Z. Rios-Mercado, Springer. [14] A. Ourou, P. Mahey and J.-Ph. Vial. (2000). A survey of algorithms for convex multicommodity flow problems. Management Science^ 46(1), 126-147. [15] R.D. McBride. (1998). Progress made in solving the multicommodity flow prob- lem, SIAM J. on Opt.^ 8, 947-955. [16] E. Ng and B.W. Peyton. (1993). Block sparse Cholesky algorithms on advanced uniprocessor computers, SIAM J. Sci. Comput.^ 14, 1034-1056. [17] O. Peton and J.-Ph. Vial. (2000). A tutorial on ACCPM, Technical Report, HEC/Logilab, University of Geneva. [18] S.J. Wright. (1997). Primal-Dual Interior- Point Methods, SIAM, Philadelphia, PA. STABILITY AND LOCAL GROWTH NEAR BOUNDED-STRONG OPTIMAL CONTROLS Ursula Felgenhauer Institute of Mathematics Technical University Cottbus, Germany felgenh@math.tu-cottbus.de Abstract Nonlinear constrained optimal control problems as a rule suffer from the so-called two-norm discrepancy, which in particular says that under stable optimality conditions the objective functionals satisfy a quadratic local growth estimate in terms of the L2 norms but in Loo neighborhoods of the solution only. Furthermore, in the case of weak local optima with continuous control functions, stability w.r.t. parameter changes usually can be expected to hold in Loo sense rather than in Lp. Whenever we consider problems with discontinuous optimal control behavior, these results are too restrictive to discuss general variations of the solution including changes in the break points or switches in the active sets. In the paper we show how the use of certain integrated opti- mality criteria obtained via a duality approach allows for estimates also in the case of discontinuous controls. We consider L2 and Li quadratic growth estimates and discuss consequences for the behavior of minimiz- ing sequences. Keywords: Constrained control problems, sufficient optimality conditions, stability. 1. Local optimality criteria in integrated form Consider first a general nonlinear constrained optimal control problem {primal problem formulation): pT (p) min J{x^u) — k{x{0),x{T)) + / r{t^x{t)^u{t)) dt Jo s.t. a.e. in [0, T], (1) II 0 (2) g{t,x{t),u{t)) < 0 a.e. in [0, T] . (3) 213 214 The pair (x,u) e 1F4(0, l;iR") x Loo(0, is called admissible for (P) if the state equation (1) (including the boundary conditions (2)) together with the inequality constraints (3) (where g : [0,T] x IR" x iR™, P : [0,T] X JR" x JR*^ — >■ JR®) is fulfilled. All data functions are assumed to be sufficiently smooth. Denote by H the HAMiLTONian, and by H the augmented HAMiLTONian related to the problem (P): H{t,x,u,p) = r{t,x,u) + p^f{t,x,u) , H{t^x^u^p^ fi) = H(t^x^u^p) + pFg{t^x^u) , /i > 0. Further, let W stand for the set W = { (t, X, u) : t e [0, 1], g{t^ rr, u) < 0 } . Consider the dual variable S given by a function S : [ 0 ,T] x IRP' -> M and the auxiliary functional 0 for ^ 1^2 ^ 0(6,6, S) = A;( 6 , 6 ) + *5(0,6) - *5(T,6) • We assume that S is at least Lipschitz continuous w.r.t. (t, x) whenever (t, x^ u) E W. Define T{S) = inf{0(6,6,*5) : ^(^ 1 , 6 ) = 0} Then the following problem is dual to the original control problem: (D) max T{S) s.t. H{t^x^u^ Sx(t^x)) + St{t^x) > 0 a.e. on W = {{t,x^u) : t E [0, 1], g{t^ x^u) < 0 } If (x, u) is an admissible pair for problem (P) and S is feasible for (D) then the duality relation ([9], also [21] or [4]) holds, i.e. J{x,u) > T{S) . The relation turns into an equality if and only if for some admissible (a;o,^o) a,nd feasible dual S ^{xo,uo, S)^ [ [H{t,x{t),u{t),Sx{t,x{t))) + St{t,x{t))]dt = 0, (4) Jo i;{xo{0),xo{T), S) = 0(xo(O),rro(T), S) - T{S) = 0 . In this case, the pair (a;o,uo) is a solution of (P). Stability and Local Growth near Bounded- Strong Optimal Controls 215 The analysis of the behavior of ^ and -0 can be further used to charac- terize local minima of (P) in detail including estimates for local growth terms if available (cf. e.g. [7]). It can be also distinguished between weak and strong local optima in dependence of the reference sets for which optimality holds. To this aim, let us introduce the sets We ^ W n = IT n ( [0, 1] X Be{xo) X Too ) or IT, = IT n ( [0, 1] X Be{x^) x Tm(0) ) where IT^ with a given constant M > 0 is used to check for so-called bounded- strong local optima (see e.g. [18] or [20]). Abstract optimality criteria have been given in [17], [21] or (in slightly generalized formulation) in [7], [6]. The results are summed up in the following Theorem: Theorem 1 Let {xo^uq) be admissible for (P). Suppose that a function S : [0,T] X IR^ IR exists which is Lipschitz continuous w.r.t. x and piecewise continuously differentiable w.r.t. t such that for a suitably cho- sen positive constant e the following relations hold with 7 = 1, D{t) = ITe(^) and = Be(xo{0),xo{T)): (Rl) T(rr, u]S) >0 V {x^u) ^ {xq^uq) with {x{t)^u{t)) e D{t) a.e. in [0,T]; (R2) ^(xo,uo;5) = 0; (R3) >0 G C7^(xo(0),xo(T)). Then {x^^uq) is a strict weak local minimizer of (P). If (Rl) - (R3) hold true for a certain constant e > 0 with 7 = 0, D{t) = We{t) and = Be{xo{0)^xo{T)), then {xq^uq) is a (strict) strong local minimizer. If the conditions are satisfied with D{t) = We{t) for some M > 0, the point {xq^uq) is a (strict) bounded- strong local optimum. The above Theorem differs from former results ([21], [17]) mainly by the consequent usage of the HAMILTON- JACOBI inequality (cf. (D)) in integrated form. This fact corresponds to some relaxation in the characterization of local minima by duality means and was first used in [4] for the theoretical convergence analysis of certain discretization methods. Notice that in some cases it is possible to find estimates for ^ (x, u; S) — "if{xo^uo] S) in terms of \\x — xqUI + ||'^^ ~ ^ local quadratic growth condition w.r.t. L 2 topology ([17], [7]) although the reference sets are neighborhoods w.r.t. Loo (^f least with respect to x). This fact 216 illustrates once more the effects of the so-called two-norm discrepancy appearing in optimal control problems (cf. [10] for details). The optimality criteria of Theorem 1 in their original parametric for- mulation have been used in [17] to derive suflficient optimality conditions for general control problems. The approach leads to conditions which in a different way had been obtained via the investigation of so-called stable weak optima ([12] or [15], see also [14]). In particular, it was shown that the following criteria ensure the strict weak local optimality of a solution pair (xq^uq): (PMP) Pontryagin’s maximum principle: i = Hp = f, l3{x{0),x{T))=0; P = -Hx, p{Q) = -V ik - V p, p{T) = V2k+ p, peM"; Hu = 0 ; p^g = 0, p > 0, g < ^- Notice that in the transversality condition given above the subscripts 1,2 stand for the gradient components corresponding to the initial and to the final state vectors respectively. For a given admissible pair (o^o, “Uo)? l^t /, . . . be evaluated along the state-control trajectory. We will denote by 5^, cr > 0, the set of cr-active constraints at t, e.g. the set of gi such that 0 > gi{t^XQ{t)^UQ{t)) > —a. (Cl) Invertibility: For some positive constant a, the gradients w.r.t. u of the a- active constraints, Vu9^ ^ Q-re uniformly linearly inde- pendent a.e. on [0,T]. (C2) Controllability: There are functions y G v G L^o satisfying = 0 ; y - Vxf y - Vuf v = Q a.e. together with the boundary condition Vi^^y(O) + V 2 $^y{T) = 0. In order to formulate second-order conditions, the index set = (i : (Jii > 6} and the related tangent spaces Ts = {C: (V(,,„)5'fC = 0 Vi 6/5}, T's = {v. {Vug^Vv = ^^i€h) are useful. The conditions then can be given as follows: (C3) Legendre- Clebsch Condition: For some positive 5, a constant O' > 0 exists such that the estimate ^ a\v\^ holds V G uniformly a.e. on the interval [0,T]. Stability and Local Growth near Bounded- Strong Optimal Controls 217 (C4) Riccati Inequality: With the abbreviations R = —(Vug^)~^'Vx9^ and P = where ( • )~^ stands for the pseudoinverse and ( • )-^ for the null- space projector of a matrix ( • ), introduce = fx+fuR. h^x ~ Rxx + HxuR + R^ Hux + R^ HuuR (/iL) = p [PHnuPy^P, hi^ = Hxu + R^Huu . For some 7 > 0, the matrix differential inequality Q y {hL + Qfn){hL)^^^\hL + Qfuf - {a^f Q - Qa^ - hix, + ll a.e. on [0,T] has a bounded in [0, 1] solution satisfying the boundary restrictions ^^(v?fc + v?^P + Q(0))^ > Tl-ei" V^eiR" e{^lk + VlPp-Q[l))^ > 7iei' VeGiR^ It is known that (Cl) - (C4) not only guarantee (Rl) - (R3) (see [7]), but also a) the Lipschitz stability of the solution in L^o w.r.t. small data perturbations ([11], [15], [3]), b) (in the case of a continuous control uo) the convergence of the Euler and related discretization methods ([14], [4], [2]). 2. Bounded-strong minima. Piecewise conditions are formulated as integral conditions in terms of When we are interested in local growth estimations, this fact gives us the chance to combine piecewise changing growth characterizations as they are typical for switches in the optimal control or for the junction of free arcs and arcs where certain constraints are active. We begin our optimality analysis with the auxiliary functional ^ — S) from (4), where the integrand has the form R[t] = {H{t,x,u,VxS) + St) [t] . (6) As it has been shown in [7], using ^(xo,uo, S) = 0 together with an expansion S{t,x) - So{t) + p{t)'^ {x - xo(t)) + 0.5 {x - xo{t))'^Q{t) {x - xo{t)) 218 and the optimal data VxS = po + Q{x — xq) and So = — ro, one can express R in the following way: i?[i] = H{x,u,po,tio) - H{xo,uo,po,fJ-o) - Hx{xo,uo,po,IJ-o)'^{x - xo) +0.5 {x - xofQ{x - xo) + (x- xofQ ( /(a;, u) - f{xo, uq) ) + IXo{9{x,u) -g{xo,uo)) By rearranging the terms related to variations w.r.t x or u, one can separate two terms i?i^2 such that i?[t] = and (with p — Po + Q(x - Xo)) = H{x,uo,p,p.o) - H{xo,uo,p,p.o) - Hx{xo, uo,po, Po)^{x - Xo) + 0.5 {x - xq)'^Q{x - xq) « 0.5 {x - Xo)'^ [Hxx + Qfx + /JQ + Q) (a: - a:o); R 2 [t] = H{x,u,p,go) - H{x,uo,p,po) - Po 9 {x,u) Under condition (C4), in particular Ri[t] will be uniformly positive if 11^ — ^olloo is sufficiently small. In order to estimate ^ resp. in [6] the following general result has been proved (Theor. 5 in the cited paper): Theorem 2 Suppose {xq^uq) to be a weak local minimizer satisfying together with some matrix function Q G the conditions (Cl) - (C4). Let R2[t] > ci\u — uo{t)\^ — C2\x — xo{t)\ Via : |ia| < M ( 7 ) hold almost everywhere on [0,1] with v{t) G {1,2} and constants not depending on t. Then {xo^uo) is a bounded- strong local minimizer, and positive constants c, e exist such that J{x,u) — J{xq,uo) > c\\x — X0W2 (8) for all admissible {x,u) with \\x — a;o||oo ^ Halloo ^ Important special cases are the following: Case 1: H{xo,v,po, jJio) is strongly convex w.r.t. v. In this case, R 2 > ci|u — uqP — 0(| x — xq\), - a situation which can be often observed when H takes its minimum in an inner point of the control set. Case 2: uq arg min H{xo,v,po, / jLo), and —pQ g{xo>,u) > c[\u — uo\. Here one can conclude that i ?2 > c[\u — uq\ — 0(| rr — o:o|). The situation Stability and Local Growth near Bounded- Strong Optimal Controls 219 occurs when a certain strict complementarity condition holds together with the invertibility assumption (Cl). The criteria given above have been discussed in detail in [6] and tested on a nonlinear example with generically discontinuous optimal control regime in [7]. As a further illustration, we consider here an example from [14]: Example: The tunneldiode oscillator. min J{x^u) = j (^u^{t) + x^{t)^ dt s.t. ±i = X 2 , a;i(0) = 3 : 2 ( 0 ) = -5, 3;i(T) == X 2 {T) = 0, X 2 — —3:1 + 3:2(1 . 4 — 0.143:2) + for I I < 1 CL.e. on[0,T]. In the paper [14], the conditions (Cl) - (C4) were checked numerically. In particular, by using a multiple shooting method a bounded matrix function Q was constructed so that the RiCCATl condition was satisfied. Thus it is reasonable to assume that a solution to (C4) exists with llQIloo < and that (Cl) - (C3) hold true for some positive a and 6. In the example situation for T = 4.5 the structure of the optimal control was obtained as = 1 on [0, ri), u = —1 on {t 2 ^t^) and u = — 2 p 2 elsewhere (where 0 < < T 2 < rs < T are found approximately from the numerical solution). The points are the so-called junction points. Consider the term i ?2 from our above growth analysis for Using the example data, we get i?2 == -Uq Ap 2 {u - Uo) -f- Ao,l(l - Uo) + Ao,2(^^0 + 1) {2uo + ^Po^ 2 ){u - Uo) + {u- Uo)^^ + 4:{u - Uo)[Q{x - 3:o)]2 > min |(Ai + A 2 ) \u — uq\^ 0.5 \u — ?ioP} — c |3: — xq\ whenever |3: — 3:o| < e < 1. Therefore, the conditions of Theorem 2 are applicable in the example, and the considered solution turns out to be a bounded-strong local minimum satisfying a local L 2 quadratic growth estimate. 3. Bounded-strong optimality in bang- bang case Consider a dynamical system with linear in state and control equation and known initial position. We will ask for a control regime which in given time gains the system to a final state as close as possible to the origin. 220 In general, in an arbitrarily given time the system not always can be terminated. Since a (small) deviation from zero in the final position is allowed, this problem class is also called soft termination control As a model case, consider a problem with box control constraints: (Ps) min J{x,u) = ^||o;(T)|p s.t. x{t) = A{t)x{t) + B{t)u{t) a.e. in[0,T]; (9) a;(0) = a; (10) < 1) * = a.e. in[0,T]. (11) The Hamilton function related to (Ps) has the form H{t,x,u,p) = p^A{t)x + p^B{t)u , whereas the augmented HAMiLTONian reads as H{t, X, u,p,p.) = H + fij {u — e) — (^ + with /^i ,2 > 0, e = (1, 1, . . . , 1)^. From Pontryagin’s maximum principle, we obtain the switching func- tion a{t) = B{tfp{t) , (12) where the costate p satisfies the adjoint equation p{t) = -A{t)'^p{t) , p{T) = x{T) , and the optimal control is given by r {+1} if ai{t) < 0, uo,i € < {-1} if >0, i = l,...,k. (13) [ [-1,+1] if cTi{t) = 0, Further, the multiplier functions p.j suffice the relations pi{t) = (5(t)^p(t))_ , p2{t) = (j5(t)^p(i))^ where the right-hand sides denote the positive resp. negative part of the related vector components. In the case that Gi{t) = 0 on a certain interval I C [0,T], we say the control uq has a singular arc (on /). In our optimality analysis, we will restrict ourselves to the case of piecewise constant uq without singular arcs: Assumption 1 The optimal control has no singular arcs. In addi- tion, the set of switching points S = { t G [0, T] : 3 i G {1, . . . , m} with (Ji{t) = 0 } is finite, i.e. — {tg 1 ^ 5 < Z } for some I E N. Stability and Local Growth near Bounded-Strong Optimal Controls 221 Remark It is well known, that the above assumption holds true e.g. in the case that A is time-independent and has exclusively real eigenvalues. Under the Assumption 1, the solution obviously satisfies (Cl), (C2): Indeed, = diag {jj) with 7 j G {+1,-1}, and the controllability assumption (C2) in the case of a (linear) initial value problem always trivially holds. When we try, however, to apply the second order optimality criteria (C3) and (C4) to our problem, serious difficulties occur: First of all, the LEGENDRE - Clebsch condition (C3) becomes sin- gular since Huu = 0 a.e. on [0,T]. It can be fulfilled formally only in the limit sense with 6 = 0 where To == {0} for all t G [0,T]\S. The limit case of the corresponding RiCCATl inequality for 5 ^ 0 gives Q T A Q + Q k 7 / a.e. I - Q{T) h which obviously can be fulfilled with arbitrary positive 7 . In particular, one can choose Q\ such that in both parts of (14) equality holds with 7 = 0.5. Then for 7 < 0.5, the matrix function Q 7 = 27 Q 1 solves (14). Moreover, it belongs to Loo(0,T; with IIQ 7 II 00 = 27||Qi||oo < c(^)7- It has to be mentioned, however, that in the case of a singular matrix Huu the conditions (C3), (C4) with 6 = 0 in general are not sufficient to show the optimality of the solution even in the weak local sense. For our problem class e.g. the linearization in the (nearly-)active constraints will allow only for zero control variation, which together with the admis- sibility assumption for the linear state equation case reduces the state variation to zero, too. The local technique of deriving estimates for ^ (resp. for J — J*) from its TAYLOR expansion (see [21] and [7], [ 6 ]) therefore fails in the given situation. Having in mind these arguments, let us restart the optimality analysis for S) (cf. (4)) with the integrand R = Ri R 2 from ( 6 ). Notice that from f{x^ u) = Ax+Bu we obtain in particular the following estimate for R\ near xq: Ri[t] ^ 0.5 {x - xo)^ (^Q + Q A + A^Q + Hxx^ (^ - ^ 0 ) • In the case when (P 5 ) is considered, we have Hxx = 0. Choosing Q = Qy and y = X — xq such that ||y||oo < with sufficiently small ei > 0 we get R,[t] > 0.25y^(g + QA + A^q) y > | |yp , (15) 222 We will further derive appropriate estimates for i ?2 and the integrals ^ under the following regularity assumption on the zeros of the switching function a: Assumption 2 There exist positive constants cq, S with the following property: For 5 E (0, S)^ denote us = Ui<s<z(^s — S^tg + S). Then, min I ( B^p{t)^ I > cqS Vt E [0,Tl\a;j . i I V / i\ In the case of problem (P 5 ), the term R 2 connected with variations w.r.t. u may be expressed by R 2 [t] = H{x,u,pq,ijlq) - H{x,uq,pq,ij,()) + {x - xof'QB{u- uq) - uo) + /io,2(^^ - ^o)- Denoting v = u — uq with an arbitrary feasible control u, we have: (^B^po^, >0 ^ (^o)i = — 1 > 0 (^B'^Po^.<0 ^ {uo)i= +1 ^ Vi<0. Therefore, T R 2 [t] = V + y^QBv > ^ jB'^po .■ \vi\ - y^QBv i=l In order to estimate the integral over R 2 , each part of the right-hand side will be integrated separately and estimated now: Ji = [ y^QBv dt < ||y||oo||Q||oo||-B||oo [ \v{t)\dt . Jo Jo From the state equation it follows that ||y||oo ^ c(A, 5) ||u||i, thus Ji < ||Q||oo||5||ooc(A5)lkll? =: 7C2\\v\\1 when Q = and for a certain constant C 2 = C 2 (A, B) > 0. . . . dt “h Stability and Local Growth near Bounded- Strong Optimal Controls 223 it is easy to see that the second part is nonnegative. Neglecting this integral (which for small 6 is of small size), from Assumption 2 we con- clude « m ^ J 2 ^ \ viit)\dt > co^ / cJt (where |i?| stands for the EuKLiDean vector norm of = v{t) in IRL^). On the other hand side, Iblli — [ \v{t)\dt = [ b(^)| dt + f |t’(^)| dt Jo J[ 0 ,T]\ljs Jcjs ^ “h J[0,T]\us for Cl = 4tkM if only ||r^||oo and ||uo||oo are bounded by M (M = 1 in (P5) e.g.). Inserting this estimate into our relation for J2, we obtain J2 > cqS ( - cid) . Combining now the estimates for Ji and J2, an estimate for / R2dt results: ^ [ R 2 [t]dt > CqS (ll'^lli - cid) - C 27 ||'y||i • ( 16 ) Jo LEMMA 1 (Weak local optimality.) Let the Assumptions 1, 2 hold for the optimal data. Then positive con- stants e, Cyj exist such that R[t]dt > Cyj (^\\ X — X 0 W 2 + \\ u — uoWi ) (1"^) for all admisssible (x,u) G Therefore^ (xq^uq) is a strict weak local minimizer (with J{x^u) — J{xq^uq) satisfying a local quadratic growth condition similiar to (17)). Proof. Our previous analysis allows to discuss the estimate (16) for various S. In particular, choose 5 Then, for ||^>||oo - '^olloo < e < 62 = (2ci5)/T, 224 and we obtain ^ R2[t]dt > (^^ - C 27 ) Iklll =: c(7)||w||f . If 7 is taken from ^ 0, ^ then c(j) is positive, so that together with (15) the last estimate leads to [ R[t] dt > 7 II 3; - a;o||2 + 0 ( 7 ) || u - uo\\l . Jo 4 Thus, the desired conclusion follows by setting = max^>o min{^, 0 ( 7 )} and e = min{ei, 62 }. □ LEMMA 2 (Strong local optimality.) Let the Assumptions 1, 2 hold true. Then positive constants e, Cg exist such that R[t] dt > Cs ^ -^olloo for all admisssible (rr,?i) G . (18) Therefore, {xq^uq) is a strict strong local minimizer (with J{x,u) — J{xq,uq) satisfying a local quadratic growth condition similiar to (18)). Proof: Consider (16) with 6 = min 1 6 2 Cl’ 2MT ^^iii csiblli . Since ||t'||oo < 2M, we have S < 27 ^^ Halloo < ^ so that, for 7 < (coC 3 )/( 4 c 2 ) and ||y||oo < ei , f R[t]dt > ^^\\v\\l . (19) follows. Notice that due to the state equation we have llylloo < c{A,B) ||u||i, and consequently (18) holds for Cg < coC3/(4c^(A,B)). □ Remark: Notice that in the case of a compact control set as in the model problem (P 5 ) the definitions of strong and of bounded-strong local op- timality coincide. Stability and Local Growth near Bounded-Strong Optimal Controls 225 As a conclusion from the last two Lemmas we easily obtain: Theorem 3 Let the Assumptions 1, 2 he satisfied for the solution (xq, 'Uq) of problem (Ps)- Then, {xq^uq) is a (bounded-) strong local minimizer, and positive constants c, e exist such that J{x,u) - J{xo,uq) > c (||a: - iColli + Ik - 'Uolli ) (20) for all admissible pairs with — a;o||oo ^ 4. Minimizing sequence stabilization In this final section, it will be shown how the results of Theorems 2 and 3 can be used to obtain certain preliminary convergence results for minimizing sequences of (P) resp. (P5). The results are orientated on [6] (Propos. 6). For convenience, the following assumption is made on the system dy- namics: Assumption 3 There exists a constant M > 0 such that for any piece- wise continuous u with ||u||oo < M the state boundary value problem (1), (2) has a bounded solution on [0, T]. If solutions corresponding to ui^2 are denoted by x \^2 resp., then || xi — X2 ||oo cq || u\ —U2 ||i holds true for some constant cq > 0 Consider first the situation of section 3 where the Legendre- Clebsch condition is fulfilled in the sense of (C3), and where Theorem 2 allows for a local quadratic growth estimation of the objective functional. LEMMA 3 Let {xq,uo) be a bounded- strong local minimizer for (P) and suppose the Assumptions 1-3 to hold true. Further, assume the estimate (8), i.e. J{x,u) — J{xq,uq) > c||rr — X0II2 to he valid for \\x — xo||oo ^ If {xk^Uk) is a minimizing sequence with uniformly bounded and piecewise continuous controls u^, and if for all k, \\uk — is sufficiently small, then Xk xq in L 2 sense. The proof is similiar to that of Proposition 6 in [6]. Notice, that in practice a minimizing sequence can be obtained e.g. by the Euler discretization approach ([1], also [2]). The error estimates in the cited papers are given in terms of the maximal deviation in the discretization grid points, a condition which for piecewise continuous controls uq is sufficient for their L\ closeness to the solution. 226 We add an analogous result for the bang-bang situation, although in this case (without an appropriate coercivity assumption like (C3)) the convergence of discretization schemes and their benefit for constructing minimizing sequences theoretically is yet an open question: LEMMA 4 Let (a;o,i/o) be a bounded- strong local minimizer for (Ps') and suppose the Assumptions 1-3 to hold true. Further, assume the estimate (20), i.e. J{x,u) - J{xo,uo) > C ( ||a; - X0W2 + ||w - noH? ) to he valid for \\x — xo||oo ^ If {xj^,Uk) is a minimizing sequence with uniformly bounded and piecewise continuous controls Uk, and if for all k, \\uk—uo\\i is sufficiently small, then x^ Xq in L 2 , and moreover, Uk uq w.r.t. L\ topology. References [ 1 ] Dontchev, A. L., and Hager, W. W. Lipschitzian stability in nonlinear control and optimization, SIAM J. Control and Optimization^ 31:569-603, 1993. [2] Dontchev, A. L., and Hager, W. W. The Euler approximation in state con- strained optimal control. Math. Comp., 70(233):173-203, 2001. [3] Dontchev, A. L., and Malanowski, K. A characterization of Lipschitzian stability in optimal control. In Ioffe et al. [8], pages 62-76. [4] Felgenhauer, U. Diskretisierung von Steuerungsprohlemen unter stabilen Optimalitdtsbedingungen, Habilitationsschrift, Brandenburgische Technische Universitat Cottbus, 1999. (122 p., in German) [5] Felgenhauer, U. On smoothness properties and approximability of optimal con- trol functions. In D. Klatte, J. Riickmann, and D. Ward, editors. Optimization with Data Perturbation II, vol. 101 of Ann. Oper. Res. Baltzer Sc. Publ., The Netherlands, vol. 101:23-42, 2001. [6] Felgenhauer, U. Structural properties and approximation of optimal controls, Nonlinear Analysis, (TMA), 47(3):1869-1880, 2001. [7] Felgenhauer, U. Weak and strong optimality conditions for constrained control problems with discontinuous control, J. Optim. Theor. Appl., 110(2):361-387, 2001 . [8] A. Ioffe, S. Reich, and I. Shafrir, editors. Calculus of variations and optimal control. Haifa 1998, vol. 411 of Chapman & Hall/CRC Res. Notes Math., Boca Raton, FL, 2000. Chapman Hall/CRC. [9] Klotzler, R. On a general conception of duality in optimal control. In: vol. 703 of Lect. Notes Math., 189-196. Springer- Verlag, New York, Heidelberg, Berlin, 1979. [10] Malanowski, K. Two-norm approach in stability and sensitivity analysis of opti- mization and optimal control problems, Adv. in Math. Sc. and Applic., 2:397-443, 1993. Stability and Local Growth near Bounded- Strong Optimal Controls 227 [11] Malanowski, K. Stability and sensitivity analysis of solutions to infinite- dimensional optimization problems, In J. Henry nad J.-P. Yvon, editors, Proc. 16th IFIP-TC7 Conference on System Modelling and Optimization^ vol. 197 of Lect. Notes in Control and Inf. Sci., 109-127, London, 1994. Springer-Verlag. [12] Malanowski, K. Stability and sensitivity analysis of solutions to nonlinear opti- mal control problems, Appl. Math, and Optim., 32:111-141, 1995. [13] Malanowski, K. Stability analysis of solutions to parametric optimal control problems. In J. Guddat, H. T. Jongen, F.Nozicka, G. Still, and F.Twilt, edi- tors, Proc. IV. Conference on ‘Parametric Optimization and Related Topics” Enschede 1995, Ser. Approximation and Optimization^ 227-244, Frankfurt, 1996. Peter Lang Publ. House. [14] Malanowski, K., Btiskens, C., and Maurer, H. Convergence of approximations to nonlinear control problems. In A. V. Fiacco, editor. Mathematical Programming with Data Perturbation, vol. 195 of Lect. Notes Pure Appl. Mathem., 253-284. Marcel Dekker, Inc., New York, 1997. [15] Malanowski, K., and Maurer, H. Sensitivity analysis for parametric optimal control problems with control-state constraints, Comput. Optim. Appl., 5:253- 283, 1996. [16] Malanowski, K., and Maurer, H. Sensitivity analysis for state constrained opti- mal control problems. Discrete Contin. Dynam. Systems, 4:241 - 272, 1998. [17] Maurer, H., and Pickenhain, S. Second order sufficient conditions for optimal control problems with mixed control-state constraints, J. Optim. Theor. Appl., 86:649-667, 1995. [18] Milyutin, A. A., and Osmolovskii, N. P. Calculus of variations and optimal control, Amer. Mathem. Soc., Providence, Rhode Island, 1998. [19] Osmolovskii, N. P. Quadratic conditions for nonsingular extremals in optimal control (A theoretical treatment). Russian J. of Mathem. Physics, 2:487-512, 1995. [20] Osmolovskii, N. P. Second-order conditions for broken extremals. In Ioffe et al. [8], pages 198-216. [21] Pickenhain, S. Sufficiency conditions for weak local minima in multidimensional optimal control problems with mixed control-state restrictions, Zeitschr. f. Anal- ysis u. Anwend. (ZAA), 11:559-568, 1992. GRAPH ISOMORPHISM ALGORITHM BY PERFECT MATCHING Kazuma Fukuda Department of Internet Media Sysytem Information Technology R&D Center MITSUBISHI ELECTRIC Corporation Kamakura, Kanagawa, JAPAN kfukuda@isl.melco.co.jp Mario Nakamori Department of Computer Science Tokyo A&T University Koganei, Tokyo, JAPAN nakamori@cc.tuat.ac.jp Abstract No polynomial time algorithm is known for the graph isomorphism prob- lem. In this paper, we determine graph isomorphism with the help of perfect matching algorithm, to limit the range of search of 1 to 1 cor- respondences between the two graphs: We reconfigure the graphs into layered graphs, labeling vertices by partitioning the set of vertices by degrees. We prepare a correspondence table by means of whether labels on 2 layered graphs match or not. Using that table, we seek a 1 to 1 cor- respondence between the two graphs. By limiting the search for 1 to 1 correspondences between the two graphs to information in the table, we are able to determine graph isomorphism more efficiently than by other known algorithms. The algorithm was timed with on experimental data and we obtained a complextity of O(n^). Keywords: Graph Isomorphism, Regular Graph 1. Introduction The graph isomorphism problem is to determine whether two given graphs are isomorphic or not. It is not known whether the problem belongs to the class P or the class NP-complete. It has been shown, 229 230 however, that the problem can be reduced to a group theory problem (van Leeuwen, 1990). Most studies of graph isomorphism (Hopcroft and Wong, 1974; Lueker, 1979; Babai et ah, 1980; Galil et ah, 1987; Hirata and Inagaki, 1988; Akutsu, 1988) restrict graphs by their characteristics. Some studies are undertaken based on group theory. Most studies are concerned on the ex- istence of algorithms (Filotti and Mayer, 1980; Babai et ah, 1982; Luks, 1982; Babai and Luks, 1983; Agrawal and Arvind, 1996), and a few papers report the implementation of algorithms (Corneil and Gotlieb, 1970) and experimental results. At present the best computational complexity by worst case analysis (Babai and Luks, 1983; Kreher and Stinson, 1998) is O This algorithm makes use of the unique certification of a graph. In the present paper, we consider the graph isomorphism problem for non-oriented connected regular graphs whose vertices and edges have no weight. We seek graph isomorphism by means of perfect matching to limit the range of 1-to-l correspondences between the two graphs as follows. First, we choose one vertex as root for each graph and reconfigure the graphs into layered graphs corresponding to the chosen vertices. Next, we label those vertices by partitioning the set of vertices by the distance from the root vertex. We construct a correspondence table which reflects whether labels on 2 layered graphs are the same or not. Then, referring to that table, we search for a 1-to-l correspondence between the two graphs. In other words, we create a bipartite graph between Vi and V 2 and find a perfect matching in this bipartite graph. In the worst case, we might enumerate all the combinations of vertices among the two graphs, which would be of exponential order. However, we have been successful in determining the isomorphism of graphs within a reasonable time using experimental data; these results are also reported in the present paper. We consider only regular graphs. Since the general graph isomorphism problem can be reduced to the regular graph isomorphism problem in polynomial time (Booth, 1978), this restriction does not lose generality. 1.1. Perfect Matching Problem The matching problem on a bipartite graph is a problem that of finding a set of edges such that any two edges do not share the same vertex (Iri, 1969). If the set covers all the vertices, the set is called perfect matching. Graph Isomorphism Algorithm by Perfect Matching 231 1 8 Original Graph 8 Figure 1. Layered Graph It is known that there exist polynomial algorithms of finding a perfect matching. (Micali and Vazirani, 1980 etc.) 1.2. Preliminaries Let the two given regular graphs be G\ — G 2 = (F 2 ,E' 2 ), where |Fi| = IF 2 I = I'^^l = n, |£;i| = |^ 2 | = |^| (= 0{n^)). Each vertex is uniquely labeled and is stored in an array of size n. Graph isomorphism is defined as follows. Definition 1 Two graphs G\ = (Vi,£'i) and G 2 = (V2?^2) iso- morphic, if there is a 1-to-l correspondence / : Vi — V 2 , such that {v,v') G El iff {f{v),f{v')) G E 2 for any {v,v') G E\. This function f is called an isomorphism between G\ and G 2 . Similarly we could define graph isomorphism in the case where one vertex is fixed in each graph. We consider only regular graphs for which the vertex degree satis- fies 3 < d < because of the relation between a graph and its complement. 2. Reconfiguring Graphs to Layered Graphs In the present paper, we make use of layered graphs to determine isomorphism. 2.1. Layered Graphs Given a graph G and a vertex r E V, the layered graph L{G, r) with root r consists of ■ vertices of G, 232 ■ edges of G, ■ level (u) for each vertex where level{u) is the shortest distance (or the depth) from r to u (Figure 1). Transforming a graph with n vertices to a layered graph can be done in 0{n?) time. 2.2, Characteristics of Layered Graphs We divide the set of vertices adjacent to v into 3 subsets, Du(v)^ Ds{v)^ and Dd{v)^ as follows: ■ Du{v) = {v' I {v^v') G E and level{v') = level{v) — 1}, ■ Ds{v) = {v' I {v^v') E E and level{v’) = level{v)}^ ■ Dd{v) — {v' I {v^v') E E and level{v') = level{v) + 1}. Let the number of vertices of each subset be du , <^ 5 , and dd : ■ du{v) == |I>«(u)|, (upper degree) ■ ds{v) = \Ds{v)\^ (same level degree) ■ dd{v) = \Dd{v)\. (lower degree) It follows that the degree of v, d{v)^ is equal to du{v) + ds{v) + dd{v). It is trivial to derive at the following: ■ du{r) = ds{r) =: 0, dd{r)=d{r), ■ each vertex v except the root vertex satisfies du{v) > 1, ■ all vertices adjacent to the vertices in level i have level i or (i± 1). Given these assumptions, we propose the following. Proposition 1 Two graphs G\ — iVi^Ei) and G 2 = (V2,F^2) morphic if and only if there are vertices v\{^ V\) and V 2 (E V 2 ) and the two layered graphs L(Gi,t?i) and L{G 2 ^V 2 ) are isomorphic. Each vertex v{e V) has a label^ ( level{v)^ du{v)^ ds{v), dd{v) ). Let the label be denoted by M{v). We call the set of vertices that have the same labels a “class,” which we denote by Bi {1 < i < A;, where k is the number of classes). For example, data from Figure 1 are shown in Table 1 sorted by label. We denote by C{G^v) the vertices of G partitioned into classes. label for a general vertex is constructed by graph appending each vertix’s degree d{v) to the level. Graph Isomorphism Algorithm by Perfect Matching 233 Table 1. Example of Labeling. Data are from graph shown in Figure 1 8 1 6 7 2 4 5 3 level 1 2 2 2 3 3 3 3 d{v) 3 3 3 3 3 3 3 3 du(v) 0 1 1 1 1 1 1 1 ds(v) 0 0 1 1 2 2 2 2 dd(v) 3 2 1 1 0 0 0 0 class 1 2 3 3 4 4 4 4 3. Finding a 1-to-l correspondence between two graphs In this section, we consider how to make use of perfect matching algorithm in order to determine the isomorphism of graphs. 3.1. Correspondence between 2 Layered Graphs For two given graphs, we consider all layered graphs for which a vertex of the graph is the root. For Vi G Vi and Vj G V 2 , we set Cij = 1 if C{Gi^Vi) and C{G 2 ^Vj) have the same labels and partitions, otherwise Cij = 0. Thus, we have a correspondence table as shown in Table 2. It is easy to see that each table entry’s value is unique and does not depend on expressions of the two graphs. Table 2. Table of Layered Graphs 1 2 • Gi i n — 1 n 1 0 1 • 0 ■ 1 0 2 0 1 •• 0 1 0 C?2 j 1 0 1 0 1 n — 1 1 0 •• 1 0 1 n 0 0 •• 0 1 0 The entries with a value of 1 are candidates for a 1-to-l correspon- dence between vertices the two graphs. As a result, we could take that correspondence, by finding perfect matchings according to the table. (Figure 2) 234 12 3 i n-1 n In the graph isomorphism problem, we have to determine whether there exists a 1-to-l correspondence between vertices in two graphs checking all possible perfect matchings^ (in the correspondence table). Of course, the possible perfect matchings do not always indicate isomor- phism, so we have to enumerate all perfect matchings and to test for isomorphism. However, the table limits the range searched for a 1-to-l correpondence. If there is no perfect matching between two graphs based on this table, they are not isomorphic. 3.2, Solutions and Issues We have implemented the above algorithm and in Section 4 applied it experimentally to determine isomorphism. We test for 1-to-l corre- spondence between vertices in two graphs as follows. ■ Construct a 1-to-l correspondence table as preprocessing. ■ Test for 1-to-l correspondence between vertices in the two graphs. Next, we enumerate 1-to-l correspondences one by one until we find a perfect matching between vertices in graphs. The program based on our algorithm and described in the next section has not adopted stronger methods to bound recursion, because we want to make it easier to understand effectiveness by using a table. However, if all entries in the table are I’s, we have to enumerate all perfect matchings. This results in many combinations of 1-to-l corre- spondence to test. This might be the worst situation for our algorithm. In such situation, however, we could consider 2 cases whether 2 graphs are isomorphic or not. ^In practice, it is not necessary to enumerate those perfect matchings to determine isomor- phism. Graph Isomorphism Algorithm by Perfect Matching 235 In the former case, since both graphs would have much symmetry, we could find a 1-to-l correspondence earlier. In the latter case, we do not need to enumerate all perfect matchings as follows : ■ Consider the two layered graph which both root vertices are cor- responding in the table. ■ Within each corresponding class between the two layered graphs, test 1-to-l correspondences. - Examine the number of same vertices adjacent to 2 vertices in each corresponding class. - If there is no 1-to-l correspondence for at least one class, they are not isomorphic. Thus, we could reduce complexity of enumeration. Also we need to consider what features of graphs indicate the worst complexity. Among other known algorithms the best complexity in the worst case analysis is of time O (Babai and Luks, 1983; Kreher and Stinson, 1998). That algorithm determines isomorphism by certifying graphs uniquely. Though it certifies by partitioning the set of vertices recursively, the basic idea in partitioning is as follows : “which parti- tioned set contains vertices adjacent to a certain vertex?” To prevent unnecessary recursions, it takes advantage of certifications results. The complexity of certification is of exponential order. 4. Experiments We have implemented the program described above and experimented on various regular graphs. 4.1, Environment and Graph Data Our experiment was carried out with a Celeron 450MHz, 128 MB memory (and 128 MB swaps) and C (gcc-2.91.66) on Linux (2.2.14). We measured running time using a UNIX like OS command “time.” We have constructed various regular graphs for input using a pro- gram that was implemented according to Matsuda et ah, 1992. Those graphs have numbers of vertices from 20 to 120 with vertex degree of 10. We constructed not only various isomorphic graphs that have the same number of vertices and degree but also non-isomorphic ones. Graph Isomorphism Algorithm by Perfect Matching 237 As a result, we conclude that the experimental time complexity is proportional to O(n^) regardless of whether the graphs are isomorphic or not. These results tend to be closer to the complexity of making a correspondence table than of examining 1 to 1 correspondences (perfect matchings ) between the two graphs. Besides, we have seen almost the same results in both cases isomorphic and non isomorphic. We anticipated that complexity might be larger as all the perfect matchings might be enumerated in the non-isomorphic case, but the result of our experiment showed to be much more efficient. Differences between average time, maximum time and minimum time in the number of vertices and degree are very small, so the program is quite stable. Standard deviations in the results are also very small (though not shown here) and didn’t have any result over 1 second. Fur- thermore, in the non- isomorphic case, we could determine lack of iso- morphism by testing only the table (in the graphs used at least). 5. Conclusions In the present paper, targeting nonweighted, undirected and con- nected regular graphs, we considered graph isomorphism by means of perfect matching to limit the range of 1 to 1 correspondence between two graphs as follows. First, we reconfigured the given graph as a lay- ered graph, labeled vertices by partitioning the set of vertices by distance from a root vertex, and prepared a correspondence table by means of whether labels on 2 layered graphs matched or not. Using that table, we find 1 to 1 correspondences between the two graphs. In our experiments, we could determine isomorphism within a practical and stable time. For further research, we have to examine other types of graphs, and analyse complexity of the program for them. Also, we wish to compare our results with practical running results of the best algorithm described in Babai and Luks, 1983 and Kreher and Stinson, 1998 whose worst complexity are known to have exponential time. References Agrawal, M. and Arvind, V. (1996). A note on decision versus search for graph auto- morphism. Information and Computation^ 131:179-189. Akutsu, T. (1988). A polynomial time algorithm for subgraph isomorphism of tree-like graphs. IPSJ 90-AL-17-2. Babai, L., Erdos, P., and Selkow, S. M. (1980). Random graph isomorphism. SIAM J. Comput.^ 9:628-635. Babai, L., Grigoryev, D. Y., and Mount, D. M. (1982). Isomorphism of graphs with bounded eigenvalue multiplicity. Proc. 14 th Annual ACM Symp. Theory of Com- puting^ pages 310-324. 238 Babai, L. and Luks, E. M. (1983). Canonical labeling of graphs. Proc. 14th Annual ACM Symp. on Theory of Computing, Boston, pages 171-183. Babel, L., Baumann, S., Ludecke, M., and Tinhofer, G. (1997). Stabcol: Graph isomor- phism testing based on the weisfeiler-leman algorithm. Technical Report Preprint TUM-M9702, Munich. Barrett, J. W. and Morton, K. W. (1984). Approximate symmetrization and Petrov- Galerkin methods for diffusion-convection problems. Comput. Methods AppL Mech. Engrg., 45:97-122. Booth, K. S. (1978). Isomorphism testing for graphs, semigroups, and finite automata are polynomially equivalent problems. SIAM J. Comput., 7:273-279. Cornell, D. G. and Gotlieb, C. C. (1970). An efficient algorithm for graph isomor- phism. J. ACM, 17:51-64. Cull, P. and Pandy, R. (1994). Isomorphism and the n-queens problem. ACM SIGCSE Bulletin, 26:29-36. Filotti, I. S. and Mayer, J. N. (1980). A polynomial-time algorithm for determining the isomorphism of graphs of fixed genus. Proc. 12th Annual ACM Symp. Theory of Computing, pages 236-243. Gain, Z., Hoffmann, C. M., Luks, E. M., Schnorr, C. P., and Weber, A. (1987). An o(n^logn) deterministic and an o(n^) las vegas isomorphism test for trivalent graphs. J. ACM, 34:513-531. Hirata, T. and Inagaki, Y. (1988). Tree pattern matching algorithm. IPSJ 88-AL-4-L Hopcroft, J. and Wong, J. (1974). Linear time algorithms for isomorphism of planar graphs. Proc. 6th Annual ACM Symp. Theory of Computing, pages 172-184. Iri, M. (1969). Network Flow, Transportation and Scheduling. Academic Press. Kobler, J., Schoning, U., and Toran, J. (1992). Graph isomorphism is low for pp. J. of Computer Complexity, 2:301-330. Kreher, D. L. and Stinson, D. R. (1998). Combinational Algorighms: Generation, Enumeration and Search. CRC. Lueker, G. S. (1979). A linear time algorithm for deciding interval graph isomorphism. J. ACM, 26:183-195. Luks, E. M. (1982). Isomorphism of graphs of bounded valence can be tested in polynomial time. J. Computer and System Sciences, 25:42-65. Matsuda, Y., Enohara, H., Nakano, H., and Horiuchi, S. (1992). An algorithm for generating regular graphs. IPSJ 92-AL-25-3. Micali, S. and Vazirani, V. V. (1980). An o{\/V • e) algorithm for finding maximum matching in general graphs. Proc. 21st Ann. IEEE Symp. Foundations of Computer Science, pages 17-27. Toran, J. (2000). On the hardness of graph isomorphism. Proc. 41st Annual Sympo- sium on Foundations of Computer Science, California, pages 180-186. van Leeuwen, J. (1990). Handbook of Theoretical Computer Science, Vol. A: Algorithm and Complexity. Elseveir. A REDUCED SQP ALGORITHM FOR THE OPTIMAL CONTROL OF SEMILINEAR PARABOLIC EQUATIONS Roland Griesse Lehrstuhl fur Ingenieurmathematik Universitdt Bayreuth, Germany roland.griesse@uni-bayreuth.de Abstract This paper deals with optimal control problems for semilinear time-de- pendent partial differential equations. Apart from the PDE, no addi- tional constraints are present. Solving the necessary conditions for such problems via the Newton-Lagrange method is discussed. Motivated by issues of computational complexity and convergence behavior, the Re- duced Hessian SQP algorithm is introduced. Application to a system of reaction-diffusion equations is outlined, and numerical results are given to illustrate the performance of the reduced Hessian algorithm. Keywords: optimal control, parabolic equation, semilinear equation, reduced SQP method, reaction-diffusion equation Introduction There exist two basic classes of algorithms for the solution of opti- mal control problems governed by partial differential equations (PDEs). They both are of an iterative fashion and are different in that Newton- type methods require the repeated solution of the (non-linear) PDE while the algorithms of SQP-type deal with the linearized PDE only. Newton- type methods have been successfully applied, e.g., to control problems for the Navier-Stokes equations in [4] and will not be discussed here. The main focus of this paper is on SQP-type methods which basi- cally use Newton’s algorithm in order to solve the first order necessary conditions. This scheme leads to a linear boundary value problem for the state and adjoint variables. It is the size of the discretized linear boundary value problem that motivates a variant of this approach in the first place: The reduced SQP method, which has been the subject of the following papers: [5] introduces reduced Hessian methods in Hilbert 239 240 spaces. [4] studies various second-order methods for optimal control of the time-dependent Navier-Stokes equations. [2] and [3] discuss algo- rithms based on inexact factorization of the full Hessian step (11) which involve the reduced Hessian (or approximations thereof) in the factors. [1] examines preconditioners for the KKT matrices arising in interior point methods, also using reduced Hessian techniques. This paper is organized as follows: In Section 1, the class of semilinear second order parabolic partial differential equations is introduced with control provided in distributed fashion. Section 2 covers optimal control problems for these PDEs and establishes the first order necessary con- ditions. Section 3 describes the basic SQP method in function spaces (also called the Newton- Lagrange method in this context), that can be used to solve these conditions. The reduced Hessian method is derived as a variant thereof. It will be seen that this method is applicable only if the linearized PDE is uniquely solvable with continuous dependence on the right hand side data. The purpose of the reduced Hessian method is to significantly decrease the size of the discretized SQP steps. The asso- ciated algorithm which requires the repeated solution of the linearized state equation and of the corresponding adjoint is presented in detail. In Section 4, this procedure is applied to a system of reaction-diffusion PDEs. Finally, numerical results are given in Section 5. While the ideas and algorithm are worked out for distributed control problems throughout this paper, boundary and mixed control problems can be treated in the very same manner with only minor modification of notation. 1. Semilinear Parabolic Equations Let be a bounded domain in with sufficiently smooth boundary r and (5 = X (0,T), S = P x (0,T) with given final time T > 0. We consider semilinear parabolic initial-boundary value problems of the following type: yt{x, t) -h A{x)y{x, t) + n(x, t, y{x, t),u{x, t)) == 0 in Q dny{x,t) + b{x,t,y{x,t)) = 0 on S (1) y{x,0) - yo{x) = 0 on fi. The elliptic differential operator A{x)y = — Dj{aij{x)Diy) is rep- resented by the matrix A{x) = {aij{x)) G which is assumed to be symmetric, and dny{x, t)=n{xY A{x)Vy{x, t) = aijni{x)Djy{x, t) is the so-called co-normal derivative along the boundary P. When A is the negative Laplace operator —A, A gives the identity matrix and dny{x^t) is simply the normal derivative or Neumann trace of y{x^t). A Reduced SQP Algorithm for the Optimal Control of Parabolic PDFs 241 Questions of solvability, uniqueness and regularity for non-linear PDEs shall not be answered here. Please refer to [7] and the references cited therein. We assume that there exist Banach spaces Y for the state, U for the control and Z for the adjoint variable such that the semilinear parabolic problem (1) is well-posed in the abstract form e(y,u) = 0 with e :Y xU Z' (2) where Z' is the dual space of Z. The operator e may represent a strong or weak form of the state equation (1). Casting the PDE in this con- venient form will allow us later to view the control problem as a PDE- constrained optimization problem and hence support a solution approach based on the Lagrange functional. However, in the detailed presentation of the algorithms, we will return to interpreting the operator e and its linearization 6y as time-dependent PDEs. 2. Optimal Control Problems In the state equation (1), the function u defined on Q is called the dis- tributed control function. A Neumann boundary control problem arises when, instead of a control function v is present in the boundary nonlinearity b{x^t,y{x^t)^v{x^t)). Other possibilities include Dirichlet boundary control or even combinations of all of the above. Examples of boundary control problems can be found, e.g., in [3] and [1]. Everything presented in this paper can be and in fact has been applied to boundary control problems with only minor modifications. The core of optimal control problems is to choose the control function u E 17 in order to minimize a given objective function. In practical terms, the objective can, e.g., aim at energy minimization or tracking a given desired state. We shall use the objective for the distributed control case from [7]: f{y,u)= / (p{x,y{x,T))dx + / g{x,t,y,u)dxdt (3) Jn Jq where cp asseses the terminal state and g evaluates the distributed control effort and the state trajectory in (0,T). The abstract optimal control problem considered throughout the rest of this paper can now be stated: Minimize /(y,?/) over (y^u) eY xU s.t. e{y^u)=^0 holds. (4) A particularly simple situation arises when the state equation (1) is in fact linear in (y,u) and the objective (3) is convex or even quadratic 242 positive definite. However, in the general case, our given problem (4) to find an optimal control u and a corresponding optimal state y minimizing (3) while satisfying the state equation e(y, u) = 0 G is a non-convex problem. We will not address the difficult question of global optimal solutions but rather assume that a local optimizer [y^u) exists. The following first order necessary conditions involving the adjoint variable A are well-known, see, e.g., [7] (with —A instead of A): -Xt 4- A{xyX + Uy{x, t, y, u)X + gy{x, t, y,u) =0 in Q dnX + by{x^t^y)X = 0 on S \{T) + cpy{x,y{T)) =0 in 0 gu{x,t,y,u) +nu{x,t,y,u)X = 0 in Q yt + A{x)y + n{x,t,y,u) =0 in Q dny + b{x,t,y) = 0 on E y(0)-yo(^)=0 on Q. (5) These can be derived by constructing the Lagrangian II ) + {e{y,u),X)z>,z (6) and evaluating the conditions Ly{y,u,\)=Q in Y' (adjoint equation) (7) Lu{y,u,X)=0 in U' (optimality condition) (8) .3 O II II Z' (state equation) (9) in their strong form. Triplets (y, fi. A) that satisfy the first order necessary conditions are called stationary points. Obviously, the conditions (5) or (7)-(9) consti- tute a non-linear two-point boundary value problem involving the non- linear forward equation (initial values given) for the state y and the linear backward equation (terminal conditions given) for the adjoint A. In the next section we introduce an algorithm to solve this problem. 3. SQP Algorithms As we have seen in the previous section, finding stationary points (y, fi. A) and thus candidates for the optimal control problem requires the solution of the non-linear operator equation system (7)-(9). This task can be attacked by Newton’s method that is commonly used to find zeros of non-linear differentiable functions. A Reduced SQP Algorithm for the Optimal Control of Parabolic PDFs 243 Suppose that we are given a triplet (y^, A^), the current iterate. The Newton step to compute updates (^y, 5u^ SX) reads ■Lyy{y\u\\^) Ly^{y\u\X^) ey{y\u^y- '6y' ■Ly{y^u^X>^)- L^y{y\u\X’^) Luu{y\u\X'^) euiyWr 6u = - Lu{y^u^X'^) . ey{y'^,u'^) euiy^u^) 0 . e{y\u’^) . -y/- fuiy^u^) + eu{y\u^)*X^ in U' (10) e{y\u^) .z'. This method is referred to as the Newton- Lagrange algorithm. It falls under the category of SQP solvers since (10) are also the necessary con- ditions of an auxiliary QP problem, see, e.g., [6]. Note that in contrast to the so-called Newton approach (cf. [4]), the iterates {y\u^) of the SQP method are infeasible w.r.t. the non-linear state equation, i.e. the method generates control/state pairs that satisfy the PDE (1) only in the limit. The operators appearing in the matrix on the left hand side (the Hessian of the Lagrangian) deserve some further explanation. First it is worth recalling that the first partial Prechet derivative of a mapping g \ X\ X X 2 Y between normed linear spaces X = X\ x X 2 and Y at a given point x = (xi^X 2 ) G X is a bounded linear operator, e.g., 9xi{x) ^ Li{Xi^Y). Consequently, the second partial Prechet derivatives at X are gx^xAx) G £(Xi, £(Xi, y)), gx^x 2 {x) G C{X 2 , C{Xi,Y)), etc. They can equivalently be viewed as bi-linear bounded operators, e.g., the latter taking its first argument from X 2 and its second from X\ and mapping this pair to an element of Y . The adjoint operators (or, precisely speaking, conjugate operators) appearing in the equation (10) can most easily be explained by their property of switching the arguments’ order in bilinear maps: eyivW) € C{Y,Z') ey{y\ v!^)* € C{Z", Y') ^ C{Z, Y') since 2" Z" ey{y^,u'^)*{z,y) = ey{y'',u^){y,z) for all y€Y,zeZ. Exploiting the fact that the adjoint variable A appears linearly in the Lagrangian L, the Newton step (10) can be rewritten in terms of the new iterate A^”^^ rather than the update 6X. For brevity, the arguments (y^, A^) will be omitted from now on: Lyy Lyu Cy 6y fy ^uy ^uu 6u — — fu 1 o 1 _ e _ ( 11 ) 244 As can be expected, this system (obtained by linearization of (7)-(9)) represents a linear two-point boundary value problem whose solution is now the main focus. To render problem (11) amenable for computer treatment, some dis- cretization has to be carried out. Inevitably, its discretized version will be a large system of linear equations since it ultimately contains the values of the state, control and adjoint at all discrete time steps and all nodes of the underlying spatial grid. Thus, one seeks to minimize the dimension of the system by decomposing it into smaller parts. The reduced Hessian algorithm is designed just for this purpose: Roughly speaking, it solves the linear operator equation (11) for Su first, using Gaussian elimination on the symbols in the matrix. A pre- requisite to this procedure is the bounded invertibility of e^(y,u) for all (y,u) which are taken as iterates in the course of the algorithm. In other words, the linearized state equation ey{y,u)h = f has to be uniquely solvable for h (with continuous dependence on the right hand side / G Z') at these points (y, u). One obtains the reduced Hessian step (^CuCy LyyCy e^ii + Luu LuyOy ^yu) ~ ^u^y [fy ~ ^yy^y ~ fu ^uy^y 6 CySy— — e — 6uSu e* = - fy - Lyy6y - Lyu6u. ( 12 ) (13) (14) The operator preceding Su is called the reduced Hessian Hsu in contrast to the full Hessian matrix H appearing in (11). Note that both the full and the reduced Hessian are self-adjoint operators. After discretization, the reduced Hessian will be small and dense, whereas the full Hessian will be large and sparse. Aiming at solving a discretized version of (12) using an iterative solver, the action of the reduced Hessian on given el- ements Su E U has to be computed, plus the right hand side of (12). It can be shown that once an approximate solution Su to (12) is found, the remaining unknowns Sy and obeying (13) and (14) can be ex- pressed in terms of quantities already calculated. The overall procedure to solve (7)-(9) applying the reduced Hessian method on the inner loop decomposes nicely into the steps described in figure 1 using the auxiliary variables eY and /i 2 , G Z. In many practical cases, the objective and the PDE separate as = fi{y) + f 2 {u) and e{y,u) = ei(y) + e 2 (u) (15) which entails Luy — Lyu = 0. We observe that for the computation of the right hand side b as well as for every evaluation of H§yf3^ h is required to solve one equation in- A Reduced SQP Algorithm for the Optimal Control of Parabolic PDFs 245 Reduced SQP Algorithm 1 Set k = Q and initialize ^ A^). 2 Solve (a) 6 yhi = e (b) Gyhj2 — fy L/yyh\ and set b ~ e*/i2 — fu + Luyh\. 3 For every evaluation of inside some iterative solver of HsuSu = 6, solve (a) Cyhs = (b) Cyh/\^ Lyyh^ Ly 2 l[A and set := Cy^h/^ Ly^yh^ -h Lyy\Z\. 4 Set := + 5 u. 5 Set 6 y —h\ — hs and y^~^^ := y^ + Sy. 6 Set ~ -h2 + /i4. 7 Set A; A; + 1 and go back to step 2. Figure 1. Reduced SQP Algorithm volving Cy and another involving e*. It will be seen in the sequel that in our case of e representing a time-dependent PDE these are in fact so- lutions of the linearized forward (state) equation and the corresponding backward (adjoint) equation, see figure 2 in the following section. Note that the linear system involving the reduced Hessian H§y is significantly reduced in size as compared to the full Hessian of the La- grangian, the more so as in practical applications, there are many more state than control variables. 4. Example As an example, distributed control of a semilinear parabolic system of reaction-diffusion equations will be discussed. The PDE system de- scribes a chemical reaction ( 7 i + C2 -> C3 where the three substances are subject to diffusion and a simple non-linear reaction law. 246 While in the discussion so far only one (scalar) PDE appears, the gen- eralization to systems of PDEs is straightforward. In the example, the state y = (ci,C 2 ,C 3 )^ as well as the adjoint A = (Ai,A 2 ,A 3 )^ now have three scalar components while the control is still one-dimensional. The linearized systems occuring in the computation of the auxiliary variables /ii, . . . , /14 feature a coupling between their components which is gener- ated by the non-linearity in the state equation (16). Also note that this example satisfies the separation condition (15). The reaction-diffusion system under consideration is given by Ci^ Di Aci - ki C 1 C 2 dnCi = 0 Ci(0) == cio C2t ^ D 2 Ac2 - k2 C 1 C 2 + U dnC2 = 0 C2(0) == C 20 (16) C 3 ^ Ds Acs + h C 1 C 2 dnCs = 0 cs{0) = cso where the control acts only through component two. The boundary conditions simply mean that the boundary of the reaction vessel is im- permeable. The constants Di and ki are all non-negative and denote diffusion and reaction coefficients, respectively. The objective in this case is a standard least-squares-type functional f{y,u)= / [ci{x,T) - Cid]^ dx + ') / u{x,tfdxdt Jq Jq in order to minimize the distance of component one’s terminal state ci(rr,T) to a given desired state c\d while taking control cost into ac- count, weighted by a factor 7 > 0. In case one is interested in maximum product yield, the term —f^cs(x,T)dx can be inserted into the objec- tive. The individual steps in the reduced Hessian algorithm for this particu- lar example are given in figure 2. There the vector (cf, C 2 , c§, A^, A| A|)^ denotes the current iterate. It stands out that the linear systems for hi, . . . ,h 4 can equivalently be written as hi^ + Khi = fi for i G {1, 3} -hjt + = gj for j e {2, 4} where the operator matrix k ^ ’ —DiA — k\C2 -k2cl hC2 —kiCi 0 -D 2 A - k2c\ 0 hc\ -DsA. (17) (18) (19) is non- symmetric. Please notice that this phenomenon does not occur in scalar PDE control problems. A Reduced SQP Algorithm for the Optimal Control of Parabolic PDFs 247 Reduced Hessian steps for the reaction-diffusion example Solve for hi — (hu , hi 2 , his)^ : hilt - Di^hii - kiC2hii - kic\hi2 = AcJ - kic\cl hi2t ~ -C^2 A/ii2 — k2c\hl2 — k2C2hii = C2t — Z^2 Ac2 — k2c\c2 + hi3t ~ DsAhis -f ksc^hii + ksCihi2 = C3^ — DsAcs + kscic^ dnhii = 0 dnhi2 = 0 dnhis = 0 ^11 ( 0 ) = Ci( 0 ) — cio ^12(0) = 02(0) — C20 hi3{0) = 03(0) — C30 Solve for /i 2 = (/i 2 i, /^ 22 , /i 23 )^: — h21t — DiAh21 — klC2h21 — k2C2h22 4 - kzC2h23 = 921 — h22t ” D2Ah22 — k2Cih22 — kiCih21 4 - ksCih 23 = 922 — h23t ~~ D3Ah23 — 0 921 = —kihi2\i — k2hi2\2 ~ kzhi2\3 922 = —kihiiXi — k2hii\2 — k3hii\3 dnh21 = 0 dnh22 — 0 ^n^23 = 0 h2i{T) = 2[c\{T) - cid - hii{T)] /i 22(T) = 0 /i23(T) = 0 Set b = —h 22 — . Solve for hs = (/ 131 , /^ 32 , /^ 33 )^: hsit — DiAhsi — kic^hzi — kic\h32 = 0 h32t — D2Ahs2 — k2Cihs2 — A^2C2^31 = — n hs3t — D3Ah33 — /C3C2/I3I — k3c\h32 = 0 dnh3i = 0 dnh32 = 0 ^n/l33 = 0 /i3l(0) = 0 /i32(0) = 0 h33{0) — 0 Solve for /i 4 = (/i 4 i, /i 42 , ^^ 43 )^: — h4it — DiAh4i — /C 1 C 2 / 14 I — k2(^h42 — k3C2h43 = 941 — h42t ~~ D2Ah42 — /C2Ci/l42 “ AjiCi/l4l — feCi/l43 = p42 — h43t ~ D3Ah43 = 0 941 = kih32\i 4- k2h32\2 4- k3h32\t 942 = kih3iXi 4- k2h3iX3 4- ^^ 3/^31 A 3 dnh41 — 0 dnh42 ~ 0 ^n^43 ~ 0 h41 (T) = 2/131 (T) h42 (T) = 0 h43 (T) = 0 Set HsuCl —h42 4- 270 . Figure 2. Reduced SQP Algorithm for the Reaction-Diffusion Example 248 5. Numerical Results In this section, results obtained from an implementation of the re- duced Hessian algorithm will be presented. All coding has been done in Matlab 6.0 using the PDE toolbox to generate the spatial mesh and the finite element matrices. The performance of the reduced Hessian algorithm will be demonstrated in comparison to an iterative algorithm working on the full Hessian of the Lagrangian H given in (11). To this end the convergence behavior over iteration count of one par- ticular SQP step (corresponding to steps 2 and 3 in the algorithm) will be shown. For the tests we chose c\{x, t) = 0.5 cio(rE) = 0.1 + X{xi>0.3}(2;) t) = 0 4(x,t) = 0.5 C2o(a;) = 0.1 + X{j.2>o.3}(a^) A^(a:,i)=0 c^{x,t) = 0.5 C 3 o(a:) = 0 A3(a;,i)==0 cid{x) =0 Di = 0.01 ki — 0.5 u’^{x,t)=0 D 2 = 0.05 k 2 = 1.5 7 = 1 I>3 = 0.15 ks = 2.5 on some finite element discretization of the unit circle O C where XA denotes the indicator function of the set A fl The final time was T- 10. As was seen earlier in equation (11), there are three block rows in iJ, corresponding to the linearizations of the adjoint equation, the op- timality condition and the state equation, respectively. For our tests, these have been semi-discretized using piecewise linear triangular finite elements in space. The ODE systems obtained by the method of lines are of the following form: My + Ky = f (forward equations) (20) —MX + K^X = g (backward equations) (21) They were treated by means of the implicit Euler scheme with constant step size. Of course, suitable higher order integrators can be used as well. Using this straightforward approach yields one drawback that becomes apparent in figure 3: The discretized full Hessian matrix H is no longer symmetric, although the continuous operator H is self-adjoint. The same holds for the discretized reduced Hessian This is due to the treatment of initial and terminal conditions in the linearized state and the adjoint equation. Nevertheless, there are methods that reestablish symmetry, but these will not be pursued in the course of this paper since qualitatively, the convergence results re- main unchanged. For that reason, the non-symmetry will be approved. A Reduced SQP Algorithm for the Optimal Control of Parabolic PDEs 249 Figure 3. Non-symmetry of dis- cretized full Hessian, nt — A time steps, implicit Euler, dotted lines in- dicate blocks corresponding to (11) thereby waiving the possibility to use, e.g., a conjugate gradient method to solve the reduced problem but relying on iterative solvers capable of non-symmetric problems. In the tests, GMRES has proved quite efficient on the full Hessian problem while CGS and BICGSTAB failed to generate reasonably better iterates than the initial all-zero guess. For the reduced Hessian, all three algorithms found the solution to high accuracy, and CGS needed the fewest iterations to do so. As a common basis, GMRES with no restarts was used for both the full and the reduced Hessian problem. Note that while the discretized state and adjoint allocate nt (equal to 4 in figure 3) discrete time steps, the discretized control needs only nt — 1. This is attributed to the use of the Euler method where, after discretization, u{t — 0) does not appear in any of the equations. In order to illustrate the convergence properties, it is convenient to have the exact discretized solution of the full SQP step (11) at hand. To that end, the full Hessian matrix was set up explicitly for a set of relatively coarse discretizations, and the exact solution was computed using a direct solver based on Gaussian elimination (Matlab’s backslash operator). The exact solution 6u of (12) was obtained in the same way after setting up the reduced Hessian matrix, where the corre- sponding 5y and were calculated performing the forward/backward integration given by (13) and (14). These two reference solution triplets differ only by entries of order lE-15 and will be considered equal. It has to be mentioned that for these low-dimensional examples (cf. ta- ble 1), a direct solver is a lot faster than any iterative algorithm. How- ever, setting up the exact reduced Hessian matrix of course is not an option for fine discretizations. Figures 4-6 illustrate the convergence behavior of GMRES working on the reduced versus the full Hessian matrix: For (^Uref denoting the exact 250 discretized solution, the graphs show the relative error history in the L?' norm, where Su^ (t) denotes the approximate solution generated by the iterative solver after j iterations, taken at the time grid point t G [0,T]. The same relative errors can be defined for 5u substituted by Jci, . . . , 5 c 3 or . . . , which are the components of the state update Sy and the new adjoint estimate Each figure shows the relative error history ej{t) of either 6u or Sci obtained using GMRES with no restarts after j = 4, 8, . . . , 28 iterations on the reduced problem and after j = 100, 200, . . . , 600 iterations on the full problem. The figures for 5c2^ Scs and . . . , A^"^^ look very much the same and are not shown here. The discretization level is characterized by the number of discrete time steps nt and the number of grid points in the finite element mesh poi. Table 1 lists the number of optimization variables in the full and reduced case for the individual discretizations used. nt poi # of vars (reduced) # of vars (full) 9 25 200 1550 9 81 648 5022 19 81 1458 10692 Table 1. Number of optimization variables for different discretizations It can clearly be seen that the iterative solver works very well on the reduced system while it needs many iterations on the full matrix. This was to be expected since it is a well-known fact (see, e.g., [1] and [2]) that iterative solvers working on the full Hessian require preconditioning. Although the evaluation of times a vector is computationally more expensive than H times a vector, the reduced Hessian algorithm is by far the better choice over the unpreconditioned full algorithm. To give some idea why the reduced Hessian algorithm outperforms the full Hessian version, let us define P = P'u^y J- -^yy^y 0 0 / 7 0 0 (23) A Reduced SQP Algorithm for the Optimal Control of Parabolic PDEs 251 Figure 4- Relative error history for 5u (left) and Sci (right) on the reduced (solid lines) problem for j = 4, 8, . . . , 28 iterations and on the full (dotted lines) problem for j = 100, 200, . . . , 600 iterations at discretization level nt = 9, poi = 25 10° ®-"" 0 0 10“ 1 1 1 1 % 1 § ! 10-^ « « * « *i * • 10-^ ^ — » » * 10’'° ^ ^ n « * — 10-^“ ' — ' , ■ , , ■ 10-'^ 10-'“ 02468 10 02468 10 time time Figure 5. Relative error history for 6u (left) and Sci (right) on the reduced (solid lines) problem for j = 4, 8, . . . , 28 iterations and on the full (dotted lines) problem for j = 100, 200, . . . , 600 iterations at discretization level nt = 9, poi = 81 as the left preconditioner for the full Hessian problem (11) with the first two columns permuted (for simplicity, the separation condition (15) is assumed to hold): From (11), we get r L e* 1 ^yy Su fy L p* -^UU 6y = -P fu _ ^u _ e _ P (24) 252 Figure 6. Relative error history for 8u (left) and Sci (right) on the reduced (solid lines) problem for j = 4, 8, . . . , 28 iterations and on the full (dotted lines) problem for j = 100, 200, . . . , 600 iterations at discretization level nt = 19, poi = 81 which is equivalent to the block-triangular system 6u Gy L p* L -^yy whose rows are just the equations (12)-(14). Hence the reduced Hessian problem is nothing else than the full problem after preconditioning with P. Comparing (11) to (25), it turns out that the preconditioning actually provides the iterative solver with some insight into the interdependence of the unknown variables. While in the full Hessian system, the solver takes all variables as degrees of freedom, in the reduced system only the true free variables (i.e. the controls) appear and the state and the adjoint are calculated consistently. From this point of view, the reduced Hessian method resembles what is usually called a direct single shooting approach, applied to a linear-quadratic model. The necessity to have the full and reduced Hessian matrix explicitly available for the numerical tests limits the discretization levels to very coarse ones throughout this paper. In practice, however, control prob- lems for time-dependent PDFs with about 275 000 unknowns (including 40 000 control variables) have been successfully solved on a desktop PC within 2 hours using the reduced Hessian SQP algorithm. References [1] Battermann, A. & Heinkenschloss, M.: Preconditioners for Karush- Kuhn- Tucker Matrices Arising in the Optimal Control of Distributed Systems, in: W. Desch, 6u ^u^y [fy ^yy^y fu Sy = —e Xk+i -fy (25) A Reduced SQP Algorithm for the Optimal Control of Parabolic PDFs 253 F. Kappel, K. Kunisch (eds.), Optimal Control of Partial Differential Equations, Vorau 1997, Birkhauser Verlag, Basel, Boston, Berlin, 1998, pp. 15-32. [2] Biros, G. k, Ghattas, O.: Parallel Lagrange- Newton- Krylov- Schur Methods for PDE- Constrained Optimization. Part I: The Krylov-Schur Solver^ Technical Re- port, Laboratory for Mechanics, Algorithms, and Computing, Carnegie Mellon University, 2000. [3] Biros, G. k Ghattas, O.: Parallel Lagrange- Newton- Krylov-Schur Methods for PDE- Constrained Optimization. Part II: The Lagrange- Newton Solver, and its Application to Optimal Control of Steady Viscous Flows, Technical Report, Lab- oratory for Mechanics, Algorithms, and Computing, Carnegie Mellon University, 2000 . [4] Hinze, M. k Kunisch, K.: Second Order Methods for Optimal Control of Time- dependent Fluid Flow, Bericht Nr. 165 des Spezialforschungsbereichs F003 Op- timierung und Kontrolle, Karl-Franzens-Universitat Graz (1999), to appear in SIAM J. Control Optim. [5] Kupfer, F.-S.: An infinite- dimensional convergence theory for reduced SQP meth- ods in Hilbert space, SIAM J. Optimization 6, 1996. [6] Nocedal, J. k Wright, S.: Numerical Optimization, Springer, 1999. [7] Troltzsch, F.: On the Lagrange-Newton-SQP Method for the Optimal Control of Semilinear Parabolic Equations, SIAM J. Control Optim. 38, No. 1, pp. 294-312, 1999. ON NUMERICAL PROBLEMS CAUSED BY DISCONTINUITIES IN CONTROLS Christian Grofimann, Antje Noack, Reiner Vanselow * Dresden University of Technology Institute of Numerical Mathematics D - 01062 Dresden, Germany {grossm, noack, vanselow3-@math.tu-dresden.de Abstract The regularity of solutions of parabolic initial-boundary value problems directly depends upon the regularity of the boundary data. Reduced regularity of boundary data arise e.g. in optimal boundary control prob- lems governed by evolution equations by a discretization of the control by piecewise constant functions and results in refined grids if automatic step size procedures in time are applied. In the present study effects to numerical methods for solving the state equations are illustrated. Moreover, an appropriate splitting of the solution is used to improve the numerical behavior of the discretization technique as well as of the optimization method applied to the control problem itself. Keywords: Boundary control, parabolic equation, discretization. 1. Introduction Smoothness properties of solutions of parabolic initial-boundary value problems directly depend upon the smoothness of initial and boundary data. As a consequence, discretizing the boundary control by piecewise given functions generically results in a reduced smoothness of solutions of the related state equations. However, the efficiency of numerical meth- ods for partial differential equations depends on the regularity of the desired solution. This yields specific effects like severe local grid refine- ments in time when standard discretization techniques are applied. In the present paper we investigate such effects and in case of piecewise constant Dirichlet controls we use a splitting of the solution to improve * Partial funding of this research provided by DFG grant GR1777/2-2. 255 256 the numerical behavior of the discretization technique as well as of the optimization method applied to the control problem. Throughout the paper we consider spatial one-dimensional boundary heat control problems 1 T f [w{x^T;u) — q{x)]‘^ dx + ^ f [u{t) — p{t)]‘^ dt ^ min \ (1.1) 0 0 subject to the state equations dw _2 d^w ^ dx‘^ = f in Q-.= (0,1) x(0,T], W = 0 on r, := {0}X(0,T], JDW + = u on r, :={l}x(0,T], w = 9 on Qo := [ 0 , 1 ] x{ 0 }, with control u and the state •;ia) in the weak sense (cf. [11], [12]). Here Tat > 0 are given coefficients satisfying 7d +7tv > 0 and a > 0 is a fixed regularization parameter. Further, [/ 7 ^ 0, C/ C Loo(0,T) denotes a set of admissible controls, q G ^ 2 ( 0 , 1) is the given target temperature and p E U denotes some fixed reference control. For controls u Eli here we restrict ourselves to discretizations E defined over a given time grid <t^ <■■■ < =T (1.3) by piecewise constant functions, i.e. u'^ eU^ ^ u^(t) = e R, Vi E t'^], k = (1.4) Here denotes the number of time intervals for control discretization. To distinguish between discretizations of control and states we indicate the first ones by upper scripts (as above) and the second ones by lower scripts. In case of Dirichlet controls, i.e. = 1, 7at = 0, jumps of at inner grid points k = 1, . . . , — 1, cause discontinuities of the re- lated solution In literature, there are several results on the numerical treatment of the heat equation with irregular solutions, where the irregularities result from the functions / or ^ (cf. [2], [ 6 ], [9], [10]). In literature one can find three approaches to overcome such difficul- ties. In the first one fitted methods are constructed with coefficients which are adapted to the singularities (cf.[ 6 ]). In the second approach a standard method is chosen but with specifically refined meshes in the On Numerical Problems Caused by Discontinuities in Controls 257 neighborhood of singularities (cf. [2] , [9], [10]). The third approach splits off the singularities. In the present paper we apply this splitting. Related details are discussed in the following section. 2. Numerical treatment of the state equations 2.1 Splitting in case of Dirichlet conditions In the present subsection we consider the case 7 d = 1, 7 at = 0. At first, we assume compatibility at x == 0, i.e. g'(O) = 0, and (for the sake of a uniform description) we extend the piecewise constant function to t = 0 by i^^(O) vP g{l). Let us now introduce functions \ Q — > R, /c = 1, . . . , by w^{x^ t) ^0 , if rc € [0, 1], 4 with the error function ( 2 . 1 ) erf(0 J e~^^ds, 0 ^ € M. ( 2 . 2 ) Definition (2.1), (2.2) yields G C'°°(Q\{(l,t^ ’^)}) and dw'^ 2 dt ^ dx^ 0 in Q. Further, has a jump w.r.t. t at (l,t*^“^). Hence, occurring disconti- nuities of the solution of (1.2) at the points (l,t^). A: = 0, . . . ,M*^ — 1, originated by jumps in u, can be captured by the functions w^. Namely, using superposition, the solution w{-, ■ ; u'^) of (1.2) can be written as w{x,t;u^) = w{x,t;u^) +v{x,t‘u'^), (x,t) E Q (2.3) for any given G C/'^, where w{-,- ■,u'^) is defined by M<= w{x,t;u'^) := w^{x,t), (x,t) £ Q (2.4) k=l and v{-,- ;u) denotes the solution of the related parabolic problem dv _2 d‘^v _ f D V = —w on F/, u = 0 on F^, (^-^) V = g on Qo- 258 Due to == ^(0) = 0 for any G U'^ ^ the smoothness of w at X = 0 and sufficiently smooth functions / and the discontinuities of w are completely captured by w. Hence, problem (2.5) allows a better numerical treatment than the original PDE. 2.2 Discretization of the state equations In the preceeding section we described the principle impact of piece- wise discretizations of controls to the smoothness of the solutions of the state equations. Now, we sketch consequences of reduced regularity to numerical methods applied to (1.2) with discretized boundary data. Among the variety of methods let us consider semi-discretization by standard method of lines (MOL) as well as full discretization schemes. The major difference of both approaches is that in the first one standard ODE solvers with efficient step size control can be applied while the full scheme provides a direct access to the time grid which will later be ad- vantageous in evaluating adjoint states for the optimal control problem. Consider some spatial grid {^z}^o interval [0, 1], i.e. 0 = a;o < < • • • < xn-i < xn = 'i-- ( 2 . 6 ) Using simple finite differences we obtain a spatial semi-discretization of the PDE by dWiu\ ^2 - Wi{t) Wi{t) - Wi-i{t) ^ ^i+l/2 f ^) 5 ^ • * • 5 1 with hi ~ Xi — Xi-i^ i = 1,... AT and hi_^ij 2 '•= {hi -|- hi^i)/2. Here and in the sequel Wi denote functions which approximate w{xi^-]u'^). In addition to (2.7) the boundary conditions from (1.2) at a; = 1 are taken into account by lD^N{t) = u''{t), te (0,T] (2.8a) and hN dwN u\ _ KW -jDWN{t)] 2 WN{t) - WN-l{t) , h + -^/(x7v,t), t G (0,T] for = 0 and jn 7^ 0, respectively, while at a; = 0 we have in both cases ^^o(^) — 0. If we consider splitting then instead of (1.2) On Numerical Problems Caused by Discontinuities in Controls 259 we apply semi-discretization to problem (2.5) and we have vc)(t) = -w{0,t;u'^), VN{t) =0. Together with the initial conditions Wi{0) = g{xi), i = ( 2 . 9 ) we obtain an IVP system for the functions Wi. Notice that in case of Dirichlet control the number of unknowns is N — 1 otherwise N. We will not explicitly distinguish these cases and write for simplicity in the sequel just N. In our first approach we treat the IVP (2.7)-(2.9) by standard ODE codes for stiff IVPs. In particular, in our study we applied BDF-codes and trapezoidal rule with automatic step size control. Alternatively to semi-discretization and standard ODE codes, to which in the sequel we refer shortly as semi-discretization, in a second approach we apply implicit Euler method with a fixed time step T/M to (2.7) - (2.9), which we denote in the sequel as full discretization. In both approaches discrete states are denoted by Wij^ i = 0, 1, . . . , TV, j = 0, 1, . . . , M, where M is the number of time steps. 3. Numerical treatment of the control problem 3.1 Gradient evaluation Discretization of the controls and the state equations leads to an ap- proximation of the original optimal control problem (1.1), (1.2) by a finite dimensional quadratic programming problem. Let us consider the case that no constraints are imposed upon the controls. The state equations result in an affine mapping transferring discrete controls G into discrete terminal states w.^m^ he. we have W.^M — “b CLh,r (^*1) with some matrix G , E^) and some vector ^ With discrete scalar products (•,•) in and respectively, we obtain problems of the type -> min ! s.t. G U'^ (3.2) with Jh,r{u'^) ■= 2i^h,Tu'"-qh,r^^h,TU'-qh,T) + (3-3) Here ~ qu ~ CLh,r-) and p'^ G U'^ denotes some approximation of p. Further, the necessary optimality conditions are given by - 9/1, r) + OC {u^ ~ p'^) = 0. 260 It should be noticed that in case of full discretization known, but will not be constructed explicitly because of the dynamic nature of the discrete state equations. However, in case of semi-discreti- zation where some ODE software code is applied to (2.7)-(2.9) then ^/i,T depend on various additional features, like built-in automatic step size controls. In case of semi-discretization as well as full dis- cretization is applied the image can be determined for any E by discrete time integration. Moreover, adjoint equations provide an efficient tool for gradient evaluations replacing the calculation of — Qhr)' For the optimal control problem (1.1), (1.2) the corresponding adjoint problem is defined by (cf. [1], [4], [7], [11]) dz , _2 _ n • z = 0 on r^, 7dz + jn^ - 0 on z = w — q on and the reduced gradient of the objective at G [7 in direction s G Loo(0, T) is given by (7^ dz J'[u) s = ^ ( 2 ( 1 , • ; n) - —(1, ■■,u) + a{u- p),s), . (3.5) 7 d + JN dx f Notice that after reversing the time orientation the adjoint problem (3.4) is of parabolic type as the state equation (1.2). However, unlike in the state equation in the adjoint equation we meet incompatibility only at one time level, namely t = T. For the remaining part of this section we restrict ourselves to the case 7 £) = 1, 7at = 0. Further, for simplicity in the sequel we consider equidistant spatial grids and denote its step size by > 0. When applying standard ODE solvers to the related semi-discrete IVP (2.7)-(2.9) and an appropriate discretization to the scalar product in (3.5) we obtain the following approximation of the discrete directional derivative Q, [o,iixm (3.4) s^, (3.6) keKi where G M, j = 1,...,M^ are the coefficients of G t/^, (zij) is the discrete solution of the adjoint problem (3.4), {'&k}^o denotes the time grid generated by the applied ODE solver and Kj :={k€{l,...,M}: {p-\ t^]}, := j = 1, . . . , On Numerical Problems Caused by Discontinuities in Controls 261 To obtain (3.6) from (3.5) besides simple integration, the derivative of ^ at X = 1 is approximated by one sided finite differences where we take into account the boundary condition z{l, •) = 0. Equation (3.6) provides the representation of the discrete gradient via the adjoints. In case of full discretization, the discrete gradient can be evaluated directly via the corresponding discrete adjoint system. Similarly to the continuous adjoint system after time reversal it turns out to be an im- plicit Euler scheme again. The obtained formula for the discrete gradient (3.6) can be also interpreted as an approximation of the continuous one. Our numerical experiments confirmed the fact that the discrete ad- joints of the full discretization lead to exact gradients as generated in automatic differentiation tools (see [3]). However, if software tools are applied to semi-discretization of (2.5) and of the adjoint equations (3.4) then only an approximation of the gradients is obtained. One reason for that deviation is that applications of ODE solvers with time step con- trol lead to discretizations of the states and adjoint states with different time grids. Thus the discretization of the adjoint states is not adjoint to the discrete states in the sense of the discrete L 2 -norm but only an approximation. Moreover, the summation in the formula for the discrete gradient (3.6) causes a further amplification of the error. Hence, to guar- antee convergence of optimization techniques based on this approach a sufficiently high order of accuracy in the applications of ODE software is required which becomes rather expensive for fine discretizations. 3.2 Selected minimization techniques Since the gradient can be obtained quite easily via adjoint states con- jugate gradient methods as well as quasi-Newton techniques (e.g. Broy- den’s symmetric update, DFP-method, . . . ) are appropriate for solving the discrete quadratic minimization problem (3.2). To make the paper self contained we describe briefly the major steps of methods used in our tests for solving (3.2). Let us denote the elements of a sequence {u^} of discrete controls by := £U'^ . In the considered piecewise constant approximation we can represent by its coefficients GM, A; = 0,1,...,M^ / = 0,1,... . As one of the methods of choice we applied conjugate gradient meth- ods. Starting with some G U'^ and /3q := 0, these methods generate a minimizing sequence {u^} C U'^ recursively by J ._ -r(u') + As'“‘. ft+i- IIA'M J-lM U JU2 i+1 — jji ^ a;s^, with Cauchy step size ai > 0. (3.7) 262 CG methods terminate with the optimal control given by the final min- imizer in a finite number of steps provided exact function and gradient evaluations are applied and no rounding errors occur. This is, however, unrealistic in the problems under consideration but the convergence can be accelerated by appropriate preconditioning (cf. [5], [8]). We detected that in the case of Dirichlet control the analytic solution (2.4) which captures jumps in boundary data serves for preconditioning. In case of unconstrained controls the Cauchy (i.e. minimizing) step size a is easily obtained. However, penalty methods for the treatment of constraints require additional step size procedures. As other methods of choice we included quasi-Newton methods into our study. Their basic idea is to define the search direction at by = -J/, (3.8) where J/ := J'(u^) and Hi^ I — 0, 1, . . ., denote matrices satisfying the related quasi-Newton equation Hi (u' - u'-i) = J'l - J/_i, / = 1, 2, . . . . (3.9) Starting with the identity Hq ~ I the matrices Hi are updated by appropriate formulas. In particular, we considered Broyden’s symmetric update. Let r'+i := J'l+i-J'i - Then the new matrix is defined by «+■ = ■»' + (3-10) In the evaluation := + 0 (/s^ the step size > 0 has been selected according to a simplified Armijo rule. For a detailed description of CG- methods and quasi-Newton methods we refer e.g. to [5], [8]. Occurring constraints < 1, i = l,...,M^ on controls have been included by the penalty term {p > 0) ■= y(iii + 1)2 +p + y'(uJ - 1)2 + P - 2 . (3.11) j=i L For p -> 0+ this tends uniformly to the well-known non-smooth penalty Po('^^) := c |^max{0, —u^ — 1} -f max{0,tA'^ — 1} , On Numerical Problems Caused by Discontinuities in Controls 263 which is exact for sufficiently large constant c > 0. For p > 0 the penalty Pp is infinitely often differentiable. This forms an advantage in comparison with loss functions. Further, unlike for barriers the values Pp{u^) are finite for any discrete control . For the first derivative and the Hessian we have + I + iy + p — 1 — ly + p and — diae- i ^ ^ ^ 2 ^ I L((^^‘ + 1)" + - I? + pf/^ respectively. These derivatives have been used directly in the quasi- Newton methods, i.e. only components related to J(-) are taken into con- sideration by the quasi-Newton update. On the other hand, in Armijo’s step size rule only penalty terms have to be repeatedly evaluated due to the quadratic nature of J(-). This accelerates the code compared to an application of an all-purpose minimization routine. 4, Numerical experiments 4.1 Preliminaries In our numerical experiments we tested the performance of different techniques applied to IB VPs (1.2) with discontinuous boundary data as well as studied effects in connection with boundary control problems of tracking type. All experiments are implemented in MATLAB. The focus in Examples 1, 2 was directed towards the behavior of automatic step size procedures in ODE codes and to an improvement of the efficiency of such codes by using the splitting described in subsection 2.1. In con- nection with optimal control in Examples 3, 4 we studied the influence of discontinuities in boundary data on the convergence of minimization techniques. In all examples we choose equidistant grids xi — i/N and = {Tk)/M^. Further, in the first two examples we choose a = 1/2, T = 1, but in the last two cr = 1, T = 0.1. The following tables and figures report on numerical results obtained by the BDF-code odelSs (option BDF— on) using several maximal or- ders of consistency (option MaxOrder) and trapezoidal rule ode23t, respectively. If not written otherwise, the default values of the relative and absolute error tolerance RelTol^le— 3 and AbsTol=le— 6, respec- tively, are used. Further, in the Dirichlet case = 1? 7 w = 0) we split 264 the experiments in direct solving problem (1.2) by the method of line (named ’direct’ in the following tables) and in applying superposition (2.3) to treat occurring jumps in boundary data. In the latter case we solve numerically the remaining smooth problem (2.5). All described effects depend on and the height of the jumps. 4.2 Example 1 (state equations) For the first example we choose / = 0, ; ^ = 0, = 3, N = bO with boundary data in (1.4) according to [u^] = (1,— 2,3)^. Fig. 1 shows the obtained solution w{x^t]u^) for Dirichlet and Neumann boundary conditions, respectively. The number of required time steps is Figure 1. 7 ^) = 1, = 0 and 7 d = 0, 7 at = 1 reported in Tab. 1. For the trapezoidal rule code the related results are marked with T instead of the order as done for BDF code. treatment 1 direct 1 superposition maximal order 1 2 5 T 1 2 5 T obtained time steps 2331 518 322 377 364 111 58 81 Table 1. Comparison of different approaches The left two graphs in Fig. 2 illustrate the behavior of the automatic step size control when applied directly or after splitting in case of Dirich- let boundary conditions. Further, in the right graph step size results in case of Neumann boundary conditions are reported. The numerical experiments show (see Fig. 2) that each jump in the control reduces the time step size drastically. On the other hand, splitting-off the discontinuities (in case of Dirichlet-boundary conditions) in advance avoids these time step size reductions and, hence, yields a more effective numerical procedure. On Numerical Problems Caused by Discontinuities in Controls 265 4.3 Example 2 (state equation, known exact solution) In this example we consider a problem with Dirichlet boundary con- ditions where the exact solution is known. The required discontinuous boundary data are generated by means of the function w introduced in Section 2. Unlike in the previous tests here we concentrate on the error behavior. Let the exact solution be given by w{x,t;u'^) = g{x) - w{x^ t\ v7) — (1 — x) tJ)(0, t\ u^) with g{x) == lQx‘^{l — x)^^. To study one internal jump only we choose = 2 and in (1.4) according to {u^} = (1, — 1)^- Fig. 3 shows the obtained solution w{x^ t; u^) and together with Fig. 4 the error of the BDF-code with MaxOrder=5 for superposition and the direct approach, respectively. In the right picture of Fig. 4 the neigh- borhood of the point (x^t) = (1,0), where a jump is located, is cut off. Figures. Solution w{x,t]u^) and Error for superposition 266 Figure 4- Error in case of the direct solution Choosing different numbers N of spatial grid points with fixed accuracy RelTol=AbsTol=le— 8 we obtain N 1600 800 200 50 obtained time steps 1274/133 1138/135 894/140 666/150 error at t = 1 9e-07 3e-06 6e-05 9e-04 Table 2. Comparison of required time steps for different N where in the second row of Tab. 2 the first number is related to direct treatment, the second to superposition. The numerical experiments reflect (see Tab. 2 and Fig. 3,4), that the step size reduction is the more severe the larger N is. Finally, we notice that the numerical solution converges at t = T, although there is no convergence locally near jumps. 4.4 Example 3 (unconstrained control problem) We consider the optimal control problem (1.1), (1.2) with p = 0, / = 0, 5 = 0 and q{x) — 0.05 sin(47rx). The convergence behavior of a CG-algorithm as well as a quasi-Newton method with Broyden’s update is compared for both the approaches dis- cussed in Subsection 3.1, i.e. that the calculation of the discrete gradient (3.5) is based on semi-discretization with discretized continuous adjoints and full discretization with discrete adjoints, respectively. In case of semi-discretization superposition is used for the solution of state as well as for the adjoint state equations. The remaining regular problems were treated by the BDF-code of MATLAB with MaxOrder=5. In Fig. 5 we On Numerical Problems Caused by Discontinuities in Controls 267 Figure 5. CG-algorithm and Broyden’s method included results from semi-discretization, RelTol=AbsTol=le— 5, semi- discretization, RelTol=AbsTol=le— 12 and full discretization, M = 500. The related curves are marked by A—, V— and o— , respectively. Further, in Fig. 6 the corresponding optimal controls received by the CG-algorithm are reported. We indicate that further slow improvements were obtained beyond the iteration steps plotted in Fig. 6. a) semi-discretization, le— 5 b) semi-discretization, le— 12 c) full discretization Figure 6. Optimal control obtained by CG-algorithm In Tab. 3 the influence of the control grid is given for full discretiza- tion. Semi-discretizations with sufficiently high accuracy in the ODE solvers show a similar behavior. In general we remark that additionally CG method Broyden’s update 10 4.38e-03 4.84e-03 25 2.59e-03 2.78e-03 50 1.42e-03 1.87e-03 100 1.40e-03 1.73e-03 Table 3. Comparison of convergence behavior for different control grids to slower convergence semi-discretization in both cases of accuracy is 268 more expensive, i.e. consumes significantly more computer time, than the full discretization. 4.5 Example 4 (constrained control problem) We choose / = 0, S' = 0 . Further, we start with a control problem (1.1), (1.2) which possesses the optimal solution Uref{t) = 1.5 sin ^1^) , t € [0,T] if no constraints are given for the controls. Using this the functions q and p are defined by q{x) := w{x^T]Uref) and p := Uref^ respectively, with the solution • \ Uref) of the state equation (1.2) for u — Uref- semi-discretization full discretization clipping 10 1.41e-03 1.36e-03 6.93e-03 25 8.83e-04 2.71e-04 2.30e-0.3 50 3.23e-04 1.06e-05 l.lOe-03 100 7.28e-04 9.84e-06 6.60e-04 Table 4- Obtained objective values for different control grids In Tab. 4 the achieved optimal values are reported for the two ap- proaches. In addition, we show in the last column the objective value for the discrete control which is obtained from the unconstrained optimal one by simple clipping along the constraints. The following Fig. 7 shows discrete optimal controls obtained by Broy- den’s update (3.10) to the quadratic part (from the state equations) and by direct use of up to second order derivatives of the penalties as given in Section 3. Further, in Fig. 8 the approximation of the tracked tar- get and a comparison between the constrained and the unconstrained optimal controls are given. The computational experiments showed a very similar behavior as in the unconstrained case. In semi-discretization the state as well as the On Numerical Problems Caused by Discontinuities in Controls 269 Figure 8. Approximation of the target Constrained, unconstrained control adjoint system have to be solved with a sufficiently high accuracy to ensure a good approximation of the gradient. This, however, results in high time consumption in the applied ODE solver. On the other hand, full discretization in more general cases (in particular in higher spatial dimensions) requires additional preparatory work compared with the use of available software codes. 5. Conclusions Piecewise constant discretization of boundary controls yields a re- duced smoothness of the solutions of state equations. In all our con- sidered examples this resulted in locally small step sizes if ODE solvers were applied to a semi-discretization of the state equations. These prob- lems could be avoided by considering in advance a specific splitting of the state equations. In the examples of optimal control problems semi-discretization was only used in connection with a separation of the discontinuities. Hence, the ODE solvers were, in fact, applied to the regular subproblem. Nev- ertheless, this approach turned out to be more time consuming then full discretization combined with discrete adjoints. In addition, full dis- cretization often yielded better values of the objectives and proved to be faster for comparable accuracy. Further, if lower accuracies were applied to speed up the ODE codes in semi-discretization then the optimization became slow due to the fact that discretizations of continuous adjoint problems lead to only rough approximations of gradients. References [1] Casas, E. (1997). Pontryagin’s principle for state-constraint boundary control problems of semilinear parabolic equations. SIAM J. Control Optim. 35:1297- 1327. [2] Crouzeix, M. and Thomee, V. (1987). On the discretization in time of semilinear equations with nonsmooth initial data. Math. Comput. 49:359-377. [3] Griewank, A. (2000). Evaluating derivatives: Principles and techniques of algo- rithmic dijferentiation. SIAM Publ., Philadelphia. 270 [4] Grossmann, C. and Noack, A. (2001). Linearizations and adjoints of operator equations - constructions and selected applications. TU- Preprint MATH-NM- 08 - 01 . [5] Grossmann, C. and Terno, J. (1993). Numerik der Optimierung. Teubner, Stuttgart. [6] Hemker, P.W. and Shishkin, G.I. (1993). Approximation of parabolic PDEs with a discontinuous initial condition. East-West J. Numer. Math. 1:287-302. [7] Kelley, C.T. and Sachs, E.W. (1999). A trust region method fo parabolic bound- ary control problems. SIAM J. Optim. 9:1064-1081. [8] Nocedal, J. and Wright, S.J. (1999). Numerical optimization. Springer, New York. [9] Rannacher, R. (1984). Finite element solution of diffusion problems with irreg- ular data. Numer. Math. 43:309-327. [10] Sammon, P. (1983). Fully discrete approximation methods for parabolic prob- lems with nonsmooth initial data. SIAM J. Numer. Anal. 20:437-470. [11] Troltzsch, F. (1984). Optimality conditions for parabolic control problems and applications. Teubner, Leipzig. [12] Troltzsch, F. (1994). Semidiscrete Ritz-Galerkin approximation of non-linear parabolic boundary control problems - strong convergence of optimal controls. Appl. Math. Optim. 29:309-329. [13] Tychonoff, A.N. and Samarsky, A. A. (1959). Differentialgleichungen der Math- ematischen Physik. Verlag d. Wissenschaft, Berlin. SOLUTIONS DIFFERENTIABILITY OF PARAMETRIC OPTIMAL CONTROL FOR ELLIPTIC EQUATIONS Kazimierz Malanowski Systems Research Institute Polish Academy of Sciences ul. Newelska 6, 01-44'^ Warszawa, Poland kmalanOibspan. waw.pl Abstract A family of parameter dependent elliptic optimal control problems with nonlinear boundary control is considered. The control function is sub- ject to amplitude constraints. It is shown that under standard coercivity conditions the solutions to the problems are Bouligand differentiable (in L'® , s < oo) functions of the parameter. The differentials are character- ized as the solutions of accessory linear-quadratic problems. Keywords: Parametric optimal control, elliptic equation, nonlinear boundary con- trol, control constraints, Bouligand differentiability of the solutions 1. Introduction In this paper, we analyse differentiability, with respect to the parame- ter, of solutions to a nonlinear boundary optimal control problem for an elliptic equation. Our aim is to show that, under a standard coercivity condition, the solutions to the optimal control problem are Bouligand differentiable functions of the parameter. Let us recall this concept of differentiability (see [3, 8, 9]). Definition 1 A function from an open set Q of a normed linear space H into another normed linear space X, is called Bouligand differentiable (or B- differentiable) at a point ho e Q if there exists a positively homo- geneous mapping Dh4>{ho) \ Q ^ X, called B-derivative, such that (j){ho + Xh) = (j){ho) + Dh(!){ho)Xh + o(|| A/i||/f). (1) Clearly, if Dh(j>{ho) is linear, it becomes Frechet derivative. 271 272 As in [4] and in [5], the sensitivity, i.e., differentiability analysis for the original nonlinear problem is reduced to the same analysis for the acces- sory linear-quadratic problem. The starting point of the analysis is the Lipschitz stability result for the solutions to the linear-quadratic ellip- tic problems due to A. Unger [11]. Using this result, B-differentiability is proved in two steps. First, passing to the limit in the difference quotient, we show the directional differentiability and characterize the directional differential as the solution of an auxiliary linear-quadratic optimal con- trol problem. Using this characterization, in the second step we show that an estimate of the form (1) holds, i.e., the solutions are Bouligand differentiable. This result can be considered as a generalization of that obtained in [1], where a different methodology was used to prove the di- rectional differentiability of the solutions to parametric elliptic problem, under the assumption that the cost functional is quadratic with respect to the control. 2. Preliminaries Let Q C denote a bounded domain with boundary F. As usu- ally, by Ay and dj^y we denote the Laplace operator and the co-normal derivative of y at F, respectively. Moreover, let be a Banach space of parameters and G C if an open and bounded set of feasible parameters. For any h e G consider the following elliptic optimal control problem: {Oh) Find {yh^Uh) G {W^^^{n) H C(0)) x L^(F) such that F{yh,Uh,h) ^ subject to -Ay{x) + y{x) dyy{x) u eU {v E Z/°°(r) I mi < v{x) < m 2 a.e. in F}. (4) In this setting, mi < m 2 are fixed real numbers, dSx denotes the surface measure induced on F, and the subscript x indicates that the integration is performed with respect to x. We assume: (Al) The domain has C^’^-boundary F. (A2) For any h e the functions (^(•, h) : M '0(-, •, /i) : MxM ^ IR and b{-^-,h) : M x M M are of class C^. Moreover, for any fixed u E IR and h £ G^ b{’^u^h) : IR ^ M is monotonically min{F(y, w, /i) / (p{y{x),h)dx + / ^j{y{x),u{x),h)dSx} Jn Jr ( 2 ) = 0 in li, b{y{x)^u{x)^h) on F, ( 3 ) 273 Solutions Differentiability of Parametric Optimal Control decreasing. There is a bound cq > 0 such that |6(0,0,/i)| + |%,„)6(0,0,/*)| + \Dfy^^)b{0Ah)\ <CG V/i G G. Moreover, for any K > 0 there exists a constant 1{K) such that \Dfy,u)Hyi^ui,h) - Dfy ^^^b{y2,U2,h)\ < KK){\yi~y2\ + - W 2 I) for all Hi^Ui such that \yi\ < \ui\ < and all h G G. The same conditions as above are also satisfied by ip and (A3) The functions 6(y, •), Dyb{y^ u, •) and Dub{y^ •) are Prechet dif- ferentiable in h. Similar properties posses functions ip and By the following lemma, proved in [6], problem {Oh) is well posed. Lemma 1 If (Al) - (A3) hold, then for any u ^ U and any h ^ G there exists a unique weak solution y{u^h) E fl C(0) of (3). Moreover, there exists c > 0 such that \\y{u', h') - y{u", /i")|lc(n) < c(||n' - u"||loo(d + \\h' - (5) Define the following Hamiltonian and Lagrangian n-.M^ xG-^M, C: x L°°(F) x x G ^ M, n{y,u,p,h) ■■= i’{y,u,h) +pHy^u,h), ( 6 ) C{y,u,p,h) ■=F{y,u,h)- / p{-Ay + y)dx Jq = [ My, h) - (Vp, Vy) - (p, y)]dx + [ %{y, u,p, h)dSx. (7) Jn Jr We assume: (A4) For a given reference value ho E G of the parameter, there exists a local solution (yo^^o) ^ of {Oho) associated state Po G n C(0), such that the following first-order necessary optimality conditions hold DyC{yo,uo,po,ho)z = 0 for all 2 G (8) DuC{yo,uo,po,ho){u -uq) >0 for all u eU. (9) In a standard way, conditions (8) and (9) yield the adjoint equation and the pointwise stationarity of the Hamiltonian: -Apo(a;) +Po(®) = Dyip{yo(x),ho), in fl, d„po{x)= Dyn{yo{x),uo{x),po{x),ho), on T, ( 10 ) 274 Du'H{yo{x),uo{x),po{x),ho){u -uo{x)) > 0 for all u € [mi, m 2 ] and a.a. a; 6 F. ( 11 ) Conditions (10) and (11) together with the state equation (3) constitute the optimality system for (O/ig). It will be convenient to rewrite this optimality system in the form of a generalized equation. To do that, define the spaces X* := IF^-«( 0 ) X L^{F) x A" = L*( 0 ) X L"(F) X L^{Cl) x T*(r) x L"(F), s 6 [2, 00] ^ ’ and the following set-valued mapping with closed graph: ^r^ \ ^ f {-^°°(r) I Ir Hv - u) dSx < 0 Vu G W} ifu£U, (13) Denote ^ = {y^u^p) G X^. Let the function x G A^, as well as the multivalued mapping T : X^ -> be defined as follows ' -Ay-t-y in Q {0} 1 duV - b{y, u, h) on r {0} -Ilp + p- Dyp{y,h) in , r[0 = {0} d,,p- Dyn{y,u,p, h) on r {0} . Dy,'H(y,u,p,h) on r M{u) J (14) Then the optimality system (3), (10), (11) for (Oho) can be expressed in the form of the following generalized equation: 0G^(6,M + r(6). (15) 3. Application of abstract theorems for generalized equations We are going to investigate conditions under which there exists a neighborhood Go of ho such that, for each h G Go, the generalized equation 0GX(C,h)+r(O (16) has a locally unique solution = (yh^Uh^Ph)^ which is Bouligand dif- ferentiable function of h. We will follow the same scheme as in [4, 5]. Namely the proof will be in two steps. First, we show existence, lo- cal uniqueness and Lipschitz continuity of the solutions to (16). In the Solutions Differentiability of Parametric Optimal Control 275 second step, we use these properties, to show differentiability of the solu- tions. In both steps we need the following auxiliary generalized equation, obtained from (16) by linearization of at the reference solution and by perturbation: € H^o,ho)+D^r{^o,ho){C-^o) + Ti(:), (17) where 6 G is the perturbation. Clearly, for ^ = 0, is a solution to (17). We will denote by (xq) := {x E X \ \\x — x^)\\x < p} the closed ball of radius p centered at xq in a Banach space X. The following Robinson’s implicit function theorem (see. Theorem 2.1 and Corollary 2.2 in [7]) allows to deduce existence and local Lipschitz continuity of the solutions to the nonlinear generalized equation (16), from the same properties of the solutions to the linearized equation (17). Theorem 1 If there exist pi > 0 and P 2 > 0 such that^ for each S G there is a unique solution ^^(^o) of (17) ^ which is Lipschitz continuous in 5, then there exist di > 0 and (72 > 0 such that, for each h G B^^{ho) there is a unique solution in B^(^o) of (16), which is Lipschitz continuous in h. Similarly, the following theorem due to Dontchev (see. Theorem 2.4 and Remark 2.6 in [2]) allows to reduce differentiability analysis for the so- lutions to (16) to the same analysis for the solutions to (17). Theorem 2 If the assumptions of Theorem 1 are satisfied and, in addi- tion, the solutions Q G B^(^o) of (17) are Bouligand differentiable func- tions of 8 in a neighborhood of the origin, with the differential {D§(^Q]r]), then the solutions of (16) are Bouligand differentiable in a neighbor- hood of ho. For a direction g £ H, the differential at ho is given by {Dhio\ g) = {DsQo\ -DhF{f^o, ho)g). (18) Remark 1 In Theorem 1, Lipschitz continuity of and ^ is understood in the sense of that norm in the space X, in which ^(-, h) is differentiable. On the other hand. Theorem 2 remains true, if B-differentiability of Q is satisfied in a norm in the image space X weaker than that in which Lipschitz continuity in Theorem 1 holds (see. Remark 2.11 in [2]); e.g., in Z/^, (5 < (X)), rather than in L^. This property will be used in Section 4. In order to apply Theorems 1 and 2 to (0/^), we have to find the form of the linearization (17) of the optimality system (16), for T and T given 276 in (14). To simplify notation, the functions evaluated at the reference point will be denoted by subscript ”0”, e.g., (pa := ^(yo)^o)) “ 'H{yo,uo,po,ho). Moreover, we denote := {yo,uo,po)- Let S = (5^, 5^) G A°° be a vector of perturbations. By simple calculations we obtain the following form of (17): IT Hi -Az + z — e^ + S^, 1 d^z - Dyhoz = e2 + + Dy})ov, I —Aq + q — + D^yPoz, d^q - Dyboq = + 5'^ + DyyUoz + D'^^'Hov, DlyHoz + Dljiov + Duboq -e^-S^G -M{v), (21) where e = (e^, e^, e^, e^, e®) € A°° is a given vector. Note that (2^o,wo,yo) = (yo,wo,Po) (22) is a solution to (LO^) for 5 = 0. An inspection shows that (LO^) can be treated as an optimality system for the following linear-quadratic accessory problem: (LP(^) Find (zs^vs) G such that I{zs, vs, S) = minl(z, v, S) subject to —Az{x) + z{x) = S^{x) in djyz{x) = Dybo{x)z{x) + Dubo{x)v{x) (23) +e‘^{x) + 5‘^{x) on F, V eu, where X{z,v^6) := ^{{z,v)^D‘^Co{z,v)) + J^{e^ + S^)zdx+ + "b 5^^z + (e^ + 5^)v]dSx, with the quadratic form {{z,v),D^Co(z,v)) := f Dlyp{yo,ho)z'^dx j o + /r'"" m D^yUo Dpio DlyHo z V (24) dS^. To verify assumptions of Theorems 1 and 2, we have to show that there exist constants pi,p2 > 0 such that for each 6 G Bjf)°°(0) there is a unique stationary point Q '■= {^ 5 ,vs,qs) in B^°°(^o) of (LP5), which is a Lipschitz continuous and Bouligand differentiable function of S. Solutions Differentiability of Parametric Optimal Control 277 4. Differentiability of solutions to accessory problems As in [4], the starting point in the proof of differentiability of the solutions to (LP^) is the Lipschitz continuity property for these solutions. To this end, we will need a coercivity assumption (see, [ 6 ]). Let us define the sets of those points, at which the reference control is active: I = {a; € r I uo{x) = mi}, J = {a: € P | uo(a:) = m 2 }. (25) Moreover, for any a > 0 define the sets /“ = {a; G P I Du'H{yo,uo,po, ho){x) > a}, J" = {a; e r I - DuPL{yo,uo,po,ho){x) > a}. As in [ 6 ], we assume: (AC) (coercivity) There exist a > 0 and 7 > 0 such that {{z,v),D‘^Co{z,v)) >7ll^lli2(r) for all pairs (z, v) satisfying —Az{x) + z{x) =0 in O, duz{x) - Dyb{){x)z{x) - DubQ[x)v{x) =0 in F, and such that v G {L^(r) | v{x) — 0 for a.a. a; G U Note that (AC) implies the following pointwise coercivity condition (see, e.g., Lemma 5.1 in [10]). Dlun^{x) > 7 for a.a. x G F \ (^ U J^). (28) By a slight modification of Satz 18 in [11] we get the following Lips- chitz continuity result for (LPj): Proposition 1 If (AC) holds, then there exist constants pi > 0 and P 2 > 0 such that, for all 6 G (0) there is a unique stationary point Cd '= of (LPs)- Moreover, there exists a constant £ > 0 such that for all 6', 5" G B^^ {0) and all s G [2, 00 ] (26) (27) ( 29 ) 278 The proof of B-differentiability of the stationary points of (LP<^) is in two steps. In the first step, directional differentiability is proved and the directional differential is characterized. This characterization is used in the second step to show that the differential is actually Bouligand. Let us start with the directional differentiability. The proof of the following result is very similar to that of Proposition 4.3 in [5]. Proposition 2 Let (A1)-(A3) as well as (AC) be satisfied and let PItP2 >0 be as in Proposition 1. Then the mapping Cs ■= {zs,vs,qs) ■■ where Q G denote a unique stationary point of (LP^^), is direc- tionally differentiable. The directional differential at 5 = Q, in a direc- tion r] G is given by where is the solution and rrj the associated adjoint state of the following linear- quadratic optimal control problem: (LQ,) Find {wrj^Wr^) G x T^(P) that minimizes Jr,{w,w) = ^{{w,w),D‘^Co{w,w)) + J rj^vo dx + J {rj'^w + rfw) dSx n (30) r subject to — /S.W VO = in (31) d^w = Dybovo + DJjqw + rf on r. w{x) < ' = 0 forxe{I^U J°), >0 forxe{I\I°), <0 for X e {J\ J°), , free for a; G F \ (/ U J). (32) Note that, by the same argument as in Proposition 1, we find that the stationary points of (LQ^^) are Lipschitz continuous functions of rj. Since (wo^wo^ro) = (0,0,0), we have lk77llL^(r). \\rr)\\w^,s^Q) <i\\r]\\As, 5G[2,oo]. (33) We are now going to show that {wjj^Wrj) and are actually B- differentials at 0 of (zs^vs) and qs^ respectively. Theorem 3 Let (A1)-(A3) as well as (AC) be satisfied and let pi, P2 > 0 be as in Proposition 1. Then the mapping Cs — {zs,V5,qs) ■■ -)■ X®, (W Solutions Differentiability of Parametric Optimal Control 279 where Q ^ denote a unique stationary point of (LP^^)^ is B- differentiable for any s G [2,oo). The B-differential at S = G in a direc- tion T] e A is given by \= {vOr^^w>q^rr^), where is the solution and Vn the associated adjoint state of problem (LQ^). Proof The optimality system for (LQ^^) takes the form: —Azu + tu — ? 7 ^, 1 di,zu - Dybow = ri‘^ + Dubow, j -Ar + r = T]^ + D^yifow, dyV - Dybor = r/^ + DyyUow + D'^^'Hqw. {DlyTio ta + Dl^Uow - Dubo r - r)^,v - w) >0 for all V E L^(Q) satisfying (32). We have to show that the solution of (35)-(37) are B- differentials of the solution to (LO^). Clearly, {wy,Wy,ry) is a positively homogeneous function of 77 , so, by Definition 1, it is enough to show that Z'q = ZQ + '€ay + ai{r}), Vy = vq + Wy + a2{ri), qr) = Qo + + ai{r]), „here ^ 0, for any s G [ 2 , oo). Denote 9 -s ||??||a°° 0 ) (38) (z^ - ^o) = (v^ - ^^o) = Wt)^ {qri - qo) -- Dj- (39) It follows from (19) and (20) that (ro^, Wrj, f^) satisfies equations identical with (35) and (36): —Aw + W = Tj^, \ d„w - Dybow = 77 ^ + Dubow, J -Ar + r = r]^ + D^ycpow, duT - Dybor = 77“^ + Dyy'Ho^ + D^^Tiow. (40) (41) To characterize {wrj,Wy,ry), we still need a condition analogous to (37). To this end, let us choose ^ E (0, a), where a is given in (AC). Define the sets iff = {x G /o I Dl^Hoix) E (0,13)}, K^ = {xEJ^ I -D 2 „?^o(:r)G( 0 ,^)}, = {a; G r I 7io(a:) G (7771,7711 + / 3 )U (m2 — , 0 , 7712)}- ( 42 ) 280 Note that meas {K^ U iif| U L^) ^ 0 as 0. (43) Let us split up the set V into the following subsets ^ = r\(7UJU L^), B = {I^\ k() U (JO \ K^), C = (J\/°)U(J\ J°), V = K(yjK^iM^. We will analyze conditions analogous to (37) on each of these subsets successively. Subset A Choose ^(/3) = Then by (22) and (25), as well as by Proposition 1, for all r] G ^^J)(0) we get Vrj{x) G (mi, m 2 ) for a.a. x e (44) i.e., by (21) DlyHoix) Zr^{x) + DlJio{x)vr,{x) + L»„6o(a:, t) qr,{x, t) —e^(x) — rf^(x) = 0 for a.a. x E A. Subtracting from (45) the analogous equation for (zo,vo,qo) and using notation (39), we obtain DlyV-dix) Wr,{x) + Dl^noix)wr,{x) - Duao{x) r^{x) —rf{x) = 0 for a.a. x E A. Subset B It follows from Proposition 1 that, shrinking g{^) > 0 if nec- essary, for all T] E J) (0) we obtain DlyHo{x) Zy{x) + DlyHo{x)vy{x) + Dubo{x) qj,{x) —e^{x) — r]^{x) > 0 <0 for a.a. x E \ iff, for a.a. xEJ^\K^, which, by (21) implies Vr,{x) = mi{x) m2{x) for a.a. x E I^\ iff, for a.a. x E \ iff) (47) i.e.. Wrj{x) = 0 Subset C By (22) and (25) we have for a.a. x E B. Vq{x) = Uo{x) - mi{x) m2{x) for a.a. x E I \ I^, for a.a. x E J \ (48) (49) Solutions Differentiability of Parametric Optimal Control and 281 DlyUoix) zq{x) + DI^%q{x)vq{x) + Dubo{x) qo{x) = 0 for a.a. x£ (/ \ 7°) U (J \ J“). Proposition 1, together with (49) implies that, shrinking g{/3) if neces- sary, for any r] € we get V V [mi, m 2 ) (mi, m 2 ] for a.a. x E I\I^, for a.a. x G j\ jo. (51) Hence, in view of (21) we have DlyTioix) Zr,{x) + DlyTloix) Vr,{x) + Dubo{x) qr,{x) -r)5tri / - ° ^ S ^ \ ^ \ < 0 for a.a. x E J \ J^. Conditions (49)-(52) imply: ~ \ f > 0 for a.a. x E I\I^, "^vi^) I < 0 for a.a. x E J\J^ ’ Dly'Ho{x) Wr,{x) + Dl^Hoix) Wr,(x) + Dubo(x) rr,{x) (53) f > 0 for a.a. xEl\I^, ^ L < 0 for a.a. x E J \ J^, and (Dly'Hoix) Wr,{x) + Dl^noix) Wnix) + Dubo{x) r^(x) -q^{x)){w - Wr, for all la > 0 for all la < 0 on 7 \ 7°, onJ\J^. (54) (55) Subset V The analysis of subset V is the most difficult, because we do not know a priori if for x G P the constraints are active or not at Vrj^ no matter how small r] is chosen. Without this information, we can say very few about Wrj{x) = Vrj{x) — vq{x). Let us denote {?}^y{x) = Dly%o{x) {Zr,{x) - zo(a:)) + Dly'Hoix) (vr,{x) - vo{x)) +Dubo{x) {qr]{x) - qo{x)) for a.a. x eT>. (56) By definition (39) we have Dly'Hoix) zor,{x) + DlyHoix)vr,{x) + Duboix) Triix) — {rf')'ix) = 0 for a.a. x E'D. ( 57 ) 282 Denote rj' = {rj^ ,rj^ , {'n^)')i where {ri^)'{x) rf{x) for X otherwise. ( 58 ) It is easy to see that (40) and (41) together with (46), (48), (53)-(55) and (57) can be interpreted as an optimality system for the optimal control problem (LQ^^/), where (LQ^^) is the following slight modification of (LQ,): (LQ,) Find {wrj^Wr]) ^ x L‘^{T) that minimizes Tr^{w^w) subject to —Aw{x) + w{x) =rj^{x) infi, dyvo + Dyb^vo = Dyh{){x)w{x) + DubQ{x)w{x) + rf' onF, r =0 w{x) > 0 < 0 t free for xe{I^\ K^) U {J^ \ K^), for rr G (/ \ /®), for X € {J \ J^), for X G r \ (7 U J)) U (TCf U K^). Similarly can be interpreted as a stationary point of (LQ^//), where ry" = with ^ I («"(-) f” - e O, [ r]^{x) otherwise, {rj^)"{x) = DlyHoix) TUr,{x) + DlJio{x)wr,{x) + Dubo{x) ry{x). (59) It can be easily checked that, as in the case of (LQyy), the stationary points of (LQy^) are Lipschitz continuous functions of r]. Hence, in view of (58) and (59), we have 1 s < i IIV - 77"||a* = i ' j \{^)'{x) - {rf')"{x K^UK^UL^ ( 60 ) Solutions Differentiability of Parametric Optimal Control 283 Using the definitions (56), (59) and taking advantage of (29) and of (33) we get l(^?")'(^) - (^T(^)I < l(^")'(^)l + l(^')"(^)l = \DlyHQix) {Zy{x) - zo{x)) + DlJio{x){vr,{x) - no(a;)) +Dubo{x) {qy{x) - 9o(a;))| (61) +\D‘ly'H.Q{x) Wr,{x) + D‘lJio{x)vr,{x) + Dyl)(i{x)ry{x)\ < c ||??||a°° for a.a. x € -fff U U L^. Substituting (61) to (60) we obtain \\wn - cc7,,||v^i,s(q), \\wr, - 'w^7?||i^(r)? Ilu; - < c||7/||a°° jmeas (iff U U I/^)| ^ . In view of (39) and(43), we find that for any e > 0 and any s G [2, oo) we can choose P{e, s) > 0 and the corresponding p(/5(e, s)), so small that \\zti — zq — ■ci7,,||wi.s(n), \\vn — Vo — 'to,,||x,^(r), Wqu — qo — r77l|wi.s(n) < e lkl|A°° for all q G Sjj(e,5))(0)- This shows that (38) holds and completes the proof of the theorem. Remark 2 The proof of Theorem 3 cannot be repeated for 5 = (X) and the counterexample in [4] shows that B-differentiability of (34) cannot be expected for 5 = cx). 5. Differentiability of the solutions to nonlinear problems By Theorems 2 and 3, for any h in a> neighborhood of /iq, (Oh) has a unique stationary point {vh^'^h^Ph)^ which is a B-differentiable function of h. On the other hand, by Theorem 3.7 in [6], for h sufficiently close to /lo, condition (AC) implies that (ys^us) is a solution to (0/j). Thus, we obtain the following principal result of this paper: Theorem 4 If (A1)-(A7) and (AC) hold, then there exist constants (Ji,(j 2 > 0 such that, for any h G i?^(/io); there is a unique stationary point (yh,Uh,Ph) in ,8^°°(^o) of (Oy), where (yh,Uh) is a solution of (0/i). The mapping {Vh^'^h^Ph) • s G [2, oc) (63) 284 is B- differentiable, and the B-differential evaluated at /iq in a direction g G H is given by the solution and adjoint state of the following linear- quadratic optimal control problem (L^) Find {zg,Vg) G x L‘^{T) that minimizes ICg{z,v) = l{{z,v),D‘^Co{z,v)) + f Dlf^ifQQzdx n n " ^ +f Dyf^Tiogz dSx + f DlhHogv dS^ subject to —/S.Z + z —Q in Ft, dyZ = DyhQ z + Dub{) V + Dhbo g on F, and ' = 0 forxe (/^U J°), >0 forxe(I\I^), v(x) < <0 forxe(J\J^), free for x €T\{I U J)). As it was noticed in Introduction, Bouligand differential becomes Prechet if it is linear. Hence from the form of (L^), we obtain immediately: Corollary 1 If meas {I \ I^) — meas ( J \ J^) = 0^ then the mapping (63) is Frechet differentiable. In sensitivity analysis of optimization problems an important role is played by the so-called optimal value function, which on B^^{ho) is de- fined by: — Th{yh,uh), i.e., to each h G 3^^ assigns the (local) optimal value of the cost functional. In exactly the same way as in Corollary 5.3 in [5], we obtain the following result showing that Bouligand differentiability of the solutions implies the second order expansion of JTq, uniform in a neighborhood of Hq. Corollary 2 If assumptions of Theorem 4 hold, then for each h = ho + geBgiho) T^{h) =J^^{ho) + {DhCo,g) / / DlyU Dl^Co ~^2 V V +o{\\g\?H). ( 64 ) Solutions Differentiability of Parametric Optimal Control 285 where {zg^Vg) is the B- differential of {yh^Uff) at Hq in the direction g, i.e., it is given by the solution to (L^). References [1] Bonnans, J.F. (1998). ’’Second order analysis for control constrained optimal control problems of semilinear elliptic systems”, Appl. Math. Optim.^ 38, 303- 325. [2] Dontchev, A.L. (1995). ’’Characterization of Lipschitz stability in optimization”. In: R.Lucchetti, J.Revalski eds.. Recent Developments in Well-Posed Variational Problems^ Kluwer, pp. 95-116. [3] Dontchev A.L. (1995). ’’Implicit function theorems for generalized equations”. Math. Program., 70 , 91-106. [4] Malanowski, K. (2001). ’’Bouligand differentiability of solutions to parametric optimal control problems”, Num. Funct. Anal, and Optim, 22, 973-990. [5] Malanowski, K. (2002). ’’Sensitivity analysis for parametric optimal control of semilinear parabolic equations”, J. Convex Anal, 9, 543-561 . [6] Malanowski, K. and Troltzsch, F. (2000). ’’Lipschitz stability of solutions to para- metric optimal control problems for elliptic equations” , Control Cybern., 29, 237- 256. [7] Robinson, S.M. (1980). ’’Strongly regular generalized equations”. Math. Oper. Res., 5, 43-62. [8] Robinson, S.M. (1987). ” Local structure of feasible sets in nonlinear programming. Part III: Stability and sensitivity”. Math. Program. Study, 30, 97-116. f9l Shapiro, A. (1990). ”On concepts of directional differentiability”, J. Math. Anal. Appl, 66, 477-487. [10] Troltzsch, F. (2000). ’’Lipschitz stability of solutions to linear-quadratic parabolic control problems with respect to perturbations”, Discr. Cont. Dynam. Systems 6, 289-306. [11] Unger, A. (1997). Hinreichende Optimalitdtshedingungen 2. Ordnung und Kon- vergenz des SQP-Verfahrens fur semilineare elliptische Randsteuerprobleme. Ph. D. Thesis, Technische Universitat Chemnitz-Zwickau. SHAPE OPTIMIZATION FOR DYNAMIC CONTACT PROBLEMS WITH FRICTION A. Myslinski System Research Institute Polish Academy of Sciences 01 -44'^ Warsaw, ul Newelska 6, Poland myslinsk@ibspan.waw.pl Abstract The paper deals with shape optimization of dynamic contact problem with Coulomb friction for viscoelastic bodies. The mass nonpenetra- bility condition is formulated in velocities. The friction coefficient is assumed to be bounded. Using material derivative method as well as the results concerning the regularity of solution to dynamic variational inequality the directional derivative of the cost functional is calculated and necessary optimality condition is formulated. Keywords: Dynamic unilateral problem, shape optimization, sensitivity analysis, necessary optimality condition 1. Introduction This paper deals with formulation of a necessary optimality condition for a shape optimization problem of a viscoelastic body in unilateral dy- namic contact with a rigid foundation. It is assumed that the contact with given friction, described by Coulomb law [2], occurs at a portion of the boundary of the body. The contact condition is described in velocities. This first order approximation seems to be physically real- istic for the case of small distance between the body and the obstacle and for small time intervals. The friction coefficient is assumed to be bounded. The equilibrium state of this contact problem is described by an hyperbolic variational inequality of the second order [2, 3, 5, 7, 17]. The shape optimization problem for the elastic body in contact con- sists in finding, in a contact region, such shape of the boundary of the domain occupied by the body that the normal contact stress is mini- mized. It is assumed that the volume of the body is constant. 287 288 Shape optimization of static contact problems was considered, among others, in [3, 8, 9, 10, 11, 16]. In [3, 8] the existence of optimal solutions and convergence of finite-dimensional approximation was shown. In [9, 10, 11, 16] necessary optimality conditions were formulated using the material derivative approach (see [16]). Numerical results are reported in [3, 11]. In this paper we shall study this shape optimization problem for a vis- coelastic body in unilateral dynamical contact. The essential difficulty to deal with the shape optimization problem for dynamic contact problem is regularity of solutions to the state system. Assuming small friction coefficient and suitable regularity of data it can be shown [6, 7] that the solution to dynamic contact problem is enough regular to differentiate it with respect to parameter. Using the material derivative method [16] as well as the results of regularity of solutions to the dynamic variational inequality [6, 7] we calculate the directional derivative of the cost func- tional and we formulate necessary optimality condition for this problem. The present paper extends the authors’ results contained in [12]. We shall use the following notation : G i?^ will denote the bounded domain with Lipschitz continuous boundary T. The time variable will be denoted by t and the time interval J == (0, T), T >0. By k G (0, oo) we will denote the Sobolev space of functions having derivatives in all directions of the order k belonging to [1]. For an interval I and a Banach space B LP{I] B), p G (1, oc) denotes the usual Bochner space [2]. ut = du/dt and uu = d?u/dt^ will denote first and second order derivatives, respectively, with respect to t of function u. utN and utT will denote normal and tangential components, respectively, of function Ut. Q = / X fi, = 7 X r^, 2 == 1, 2, 3 where F^ are pieces of the boundary F. 2. Contact problem formulation Consider deformations of an elastic body occupying domain G 72^. The boundary F of domain 0 is Lipschitz continuous. The body is subjected to body forces / = (/i,/2)- Moreover surface tractions p = (pi,P2) are applied to a portion Fi of the boundary F. We assume that the body is clamped along the portion Fq of the boundary F and that the contact conditions are prescribed on the portion F2 of the boundary F. Moreover F^- fl Fj = 0, i 7^ j, i^j == 0, 1, 2, F = Fq U Fi U F2. We denote hy u = (i^i, 1^2)7 u = u(t, rr), a; G fi, t G [0,7^, T > 0 the displacement of the body and by cr = {aij{u{t^x))}^i^j = 1,2, the stress field in the body. We shall consider elastic bodies obeying Hooke’s law Shape Optimization for Dynamic Contact Problems with Friction 289 [2, 3, 5, 17] : aij{u) = c^ijklix)eki{u) + cljki{x)eki{ut) a; € O, eki = ^{uk,i + ui^k) (1) = 1,2, = dukjdxi. We use here the summation convention over repeated indices [2]. c^jki(x) and c}jj^i{x), = 1,2 are com- ponents of Hooke’s tensor. It is assumed that elasticity coefficients and satisfy usual symmetry, boundedness and ellipticity conditions [2, 3, 5]. In an equilibrium state a stress field a satisfies the system [2, 3, 6, 7] : ~~ ~ (^5^) ^ (O5 ^ ^ h j ~ (^) where aij{x)j = daij{x)/dxj^ i^j = 1,2. There are given the following boundary conditions : Ui{x) = 0 on (0, T) X To i = 1, 2, aij{x)rij = Pi on (0,T) xTi i,j = 2] (3) utN ^0^ aN < 0, utN(^N — 0, on (0,T) x F2; (4) uiT = 0 I ctt |< ^ I cttv I; utT 0 ^ ar = -T | aw | -j — (5) I UtT I Here we denote : un — aw = dijUiUj^ (^r)i = Ui — ujsfUi^ (ctt)z = (TijUj — (7^711 i^j = 1,2, n == (^1,^2) is the unit outward vector to the boundary F. There are given the following initial conditions: Ui{0^x)=uo uti{0^x)=ui^ i = l,2, X e (6) We shall consider problem (2)-(6) in the variational form. Let us assume, / € {H^n-,R^)r)nLHQ-,R^), peL\l; {H^^^{Ti-,R^)r), uo € H^I‘^{^]R?) ui G H^/^{n-,R^), u^r, = 0, (7) T G L°°(r2;i?^) is continuous for a.e. x ET 2 be given. The space L‘^{Q;R?) and the Sobolev spaces i7V4(j. ag ^g defined in [1, 2], Let us introduce : F^{zeL'^{I-,H^{n-R^)) : Zi^O on (0, T) X Tq , i = 1, 2} (8) 290 K = [z e F : ztN <0 on (0, T) x F 2 }. (9) The problem (1) - (6) is equivalent to the following variational problem [6, 7]: find u € T°°(7; B?))f\K such that ut G L°°(7; L2(0; i?2)) nifV2(/. i 2 (^. and uu 6 B?)) n(i7^/^(J; satisfying the following inequality [6, 7], / uttidxdr + / (Tij{u)eij{vi - uti)dxdr+ J Q J Q / F I on{u) I (I Ut I - I utT \)dxdT > / fi{vi - uu)dxdT+ Jj2 Jq I Pi{vi - uti)dxdr 'dv e H^^^{I-,H\n;R^))nK. ( 10 ) Note, that from (2) as well as from Imbedding Theorem of Sobolev spaces [1] it follows that uq and u\ in (6) are continuous on the boundary of cylinder Q. The existence of solutions to system (1) - (6) was shown in [6, 7]: Theorem 2.1 Assume : (i) The data are smooth enough, i.e. (2) is satisfied, (ii) T 2 is of class (Hi) The friction coefficient is small enough. Then there exists a unique weak solution to the problem (1) - ( 6 ). Proof. The proof is based on penalization of the inequality (10), friction regularization and employment of localization and shifting technique due to Lions and Magenes. For details of the proof see [7]. □ For the sake of brevity we shall consider the contact problem with prescribed friction, i.e., we shall assume T I (Jat gt ^ 1* (11) The condition (4) is replaced by the following one, UtTO'T+ I UtT |— O 5 I 1^ 1 on / X F 2 . (12) Let us introduce the space A = {A € L°^{T 2 )) : I a |< 1 on 7 x r 2 }. (13) Taking into account (12) the system (10) takes the form : Find u G K and A € A such that / Uttidxdr + / aij{u)eij{vi - uti)dxdr - / Xt{vt - utT)dxdr J Q J Q Jj2 Shape Optimization for Dynamic Contact Problems with Friction 291 > / fi{vi-uti)dxdT+ / pi{vi - uti)dxdT JQ hi / GTUtTdsdr < / \TUtTdsdr \/Xt G A. y 72 y 72 (14) (15) 3. Formulation of the shape optimization problem We consider a family of the domains Qg depending on parameter s. For each ftg we formulate a variational problem corresponding to (10). In this way we obtain a family of the variational problems depending on s and for this family we shall study a shape optimization problem , i.e., we minimize with respect to 5 a cost functional associated with the solutions to (10). The domain we shall consider as an image of a reference domain ft under a smooth mapping T^. To describe the transformation we shall use the speed method [16]. Let us denote by 1^(5, x) an enough regular vector field depending on parameter 5 G [0, ^), > 0 : V{.,.) : [O,??) xE? V{s,.) €C^{R^,R^) VsG[0,i 9), y(.,a;) G ^xeB?. (16) Let T 5 (y) denotes the family of mappings : Tj(y) : B? 3 X ^ x{t,X) G B^ where the vector function x{.,X) = x{.) satisfies the sys- terns of ordinary differential equations : ^x{t, X) = V(t, x{t, X)),t e [0, d), *(0, X)=X eB. (17) dr We denote by DTg the Jacobian of the mapping Tg(F) at a point X G B?. We denote by and the inverse and the transposed inverse of the Jacobian DTg^ respectively. Jg == detjDT^ will denote the determinant of the Jacobian DTg. The family of domains {fig} depending on parameter 5 G [0, ??), > 0, is defined as follows : fto = ft ftg = Tg{ft){V) = {xeR^ : 3X eR^ s. th. - x{s,X), where the function x{.^X) satisfies (17) for 0 < r < s}. (18) Let us consider problem (14) - (15) in the domain ftg. Let Fg, Kg^ Ag be defined, respectively, by (8, (9), (13) with ftg instead of ft. We shall 292 write Ug = u{flg), Og = The problem (14) - (15) in the domain takes the form : find Ug € Kg and Ag G Ag such that, / uttsiVidxdT+ aij{ug)eij{vi - utsi)dxdT - •^Qs Q s / Ast(ut - utsT)dxdr > / fi{vi - Utgi)dxdr + •^'752 Qs [ Piivi - utgi)dxdT yv€H^^^{I;H\ng-,R^))nK (19) / asTUtgTdsdr < / XgTUtsTdsdr VA^t € A* *^75 2 ''ls2 ( 20 ) We are ready to formulate the optimization problem. By Cl C we denote a domain such that C 0 for all 5 G [0, i?), > 0. Let (j) E M be a given function. The set M is determined by : M = {cf>e < 0 on / X O, II 0 IUo.(,.^2(n;«2) < 1} ( 21 ) Let us introduce, for given 0 G M, the following cost functional : J(j){^s) ~ I ^ sN 4^tN ^ ( 2 ^) Jls2 where (ptNs and Gsn are normal components of (j)ts and cJs, respectively, depending on parameter 5. Note, that the cost functional (22) approx- imates the normal contact stress [3, 8, 11]. We shall consider such a family of domains {Os} that every O5, 5 G [0,7?), 7? > 0, has constant volume c > 0, i.e. : every O5 belongs to the constraint set U given by : U = {Vls : [ dx = c}. (23) J Os We shall consider the following shape optimization problem : For given 0 G M, find the boundary F2s of the domain Og occupied by the body, (24) minimizing the cost functional (22) subject to fig ^ U. The set U given by (23) is assumed to be nonempty. (7/5, Xg) ^ Kg x Ag satisfy (19) - (20). Note, that the goal of the shape optimization problem (24) is to find such boundary F2 of the domain occupied by the body Shape Optimization for Dynamic Contact Problems with Friction 293 that the normal contact stress is minimized. Remark, that the cost functional (22) can be written in the following form [3, 17] : / asN(f>tNdsdT = / utts<l>tsdxdT+ asij{us)eki{(j)ts)dxdT - (25) •^725 Qs '^Qs / f^tsdxdr - / Pa4)tsdsdT - / asTfptTsdsdr. J Oc, J « We shall assume there exists at least one solution to the optimization problem (24). It implies a compactness assumption of the set (23) in suitable topology. For detailed discussion concerning the conditions as- suring the existence of optimal solutions see [3, 16]. 4. Shape derivatives of contact problem solution In order to calculate the Euler derivative (44) of the cost functional (22) we have to determine shape derivatives {u'^ A') G F x A of a solution (us^ As) e Ks X As of the system (19)-(20). Let us recall from [16] : Definition 4.1 The shape derivative u' E F of the function Ug G Fg is determined by : {ug)^Q = u + su' + 0 ( 5 ), (26) where || 0 ( 5 ) \\f /s —> 0 for s 0, u = uq E F, Ug E F{B?) is an extension of the function Ug E Fg into the space F{B?). F{B?) is defined by (8) with B? instead of Ft. In order to calculate shape derivatives (ia', A') G F x A of a solution {ug^Xg) E Kg X Ag of the system (19), (20) first we calculate material derivatives (^i. A) G F x A of the solution {ug^Xg) E Kg x Ag to the system (19), (20). Let us recall the notion of the material derivative [16]: Definition 4.2 The material derivative u E F of the function Ug E Kg at a point X E Ct is determined by : lim II [(lis o Tg) — a]/s — u ||ir= 0, (27) where u E K , Ug o’Tg ^ K is an image of function Ug E Kg in the space F under the mapping Tg. Taking into account Definition 4.2 we can calculate material deriva- tives of a solution to the system (19), (20) : Lemma 4.1 The material derivatives (fi. A) E Ki x A of a solution {ug^Xg) E Kg X Ag to the system (19)-(20) are determined as a unique 294 solution to the following system : f {{uttV + uttV + uttrjdivV (0)(DF + utt{DV {0 )t]) JQ -fn - fV + icrij{u)ekiirj) - fi])divV {0)}dxdT - ( 28 ) / {PV +PP + ppD)dxdr - / {(Ar^tr + >^PtT + 7 71 J72 Wr]TV{Q)n + Xr]tTD}dxdT >0 V77 G Ki, / {X- ji)utT-\-{X- [i)utT + {X- ^l)utT + ^utTD}dxdT V/i G Li, ( 29 ) J72 where V(0) — V(0,X), DV(0) denotes the Jacobian matrix of the matrix V(0). Moreover : K\ = {^ E F : ^ — u — DVu on 70 ; > nDV{0)u on Ai, = nDV{0)u on A 2 }, (30) Af) = {x ^^2 : utN == 0 }, Ai = {x e B \ (tn ^ 0 }, A 2 = {x e B : gn < 0}, (31) 50 == {(T G 72 : Xt = 1, utT 7^ 0}, 51 = {a; G 72 : At == -1, utr — 0}, (32) S 2 = {a; G 72 : At == 1, : uit = 0}. Li = G A : ^ > 0 072 B 2 , ^ < 0 on Si; ^ = 0 072 So } (33) and D is given by D = div V{Q) - (SF(0)n,n). (34) Proof: It is based on approach proposed in [16]. First we transport the system (19)-(20) to the fixed domain fi. Let == o G F, n = no G F, A*^ == A5 o T5 G A, A = Ao G A. Since in general n^ ^ K{fl) we introduce a new variable = ST~^n^ G K. More- over z = u — DV{0)u [7, 15]. Using this new variable as well as the formulae for transformation of the function and its gradient into refer- ence domain Q [15, 16] we write the system (19)-(20) in the reference domain O. Using the estimates on time derivative of function u [7] the Lipschitz continuity of n and A satisfying (19) - (20) with respect to s can be proved. Applying to this system the result concerning the differ- entiability of solutions to variational inequality [15, 16] we obtain that Shape Optimization for Dynamic Contact Problems with Friction 295 the material derivative {u^X) ^ K\ x A satisfies the system (28)-(29). Moreover from the ellipticity condition of the elasticity coefficients by a standard argument [15] it follows that ('u, A) G x A is a unique solution to the system (28)-(29). □ Recall [16], that if the shape derivative u' E F of the function Us G Fg exists, then the following condition holds : u' = u-VuV{0), (35) where u £ F is material derivative of the function Ug ^ Fg. From regularity result in [7] it follows that : Vt^^(O) G F, vAt^(O) G a, (36) where the spaces F and A are determined by ( 8 ) and (13) respectively. Integrating by parts system (28), (29) and taking into account (35), (36) we obtain the similar system to (28), (29) determining the shape deriva- tive (i/', A^) G F X L of the solution {ug, Xgp) E Kg x Lg of the system (19) - (20) : f u[^r] + uttr}' + {DV (0) +* DV {Q))uur}dxdT + f uurjV (0)n Jq Jj / aij{u')eklT] - / X'rjtT + Xr][j^}dxdT + Jq J72 h{ut,v) + h{X,u,rj) >0 V 77 € iVi, (37) — A) — utrX'jdxdT + 73 ( 14 , /i — A) > 0 G Li, (38) Ni = {r]eF : t] = X - DuV{0), X e Ki}, (39) hi^P, (f>)= Wij{p)ekl^ - f(i>- J-y {{Vpn)(j) + {pV(j))n + p(j)H)V {0)n}dxdT, hip,P,(f>)=[ {{Vp)nV(f) + p{'V{V(pn))(p-\- '>'72 liS/ipirH + jiV ipn}V {Qi)ndxdr^ Is{(p, /i - A) — / {(pn){fj, - A) + (p{V/j.n) - (p{VXn) -f J 72 (p{li — X)H]V{ff)ndxdr^ where H denotes a mean curvature of the boundary F [16]. (40) (41) ( 42 ) 296 5. Necessary optimality condition Our goal is to calculate the directional derivative of the cost functional (22) with respect to the parameter s. We will use this derivative to formulate necessary optimality condition for the optimization problem (24). First, let us recall from [16] the notion of Euler derivative of the cost functional depending on domain : Definition 5.1 . Euler derivative dJ(Vt\V) of the cost functional J at a point Ct in the direction of the vector field V is given by : dJ{n; V) = limsup[J(fi,) - J{ft)]/s. (43) 5 — >-0 The form of the directional derivative dJ(p{u]V) of the cost functional (22) is given in : Lemma 5.1 The directional derivative dJ(j){u;V) of the cost functional (22), for (f e M given, at a point u £ K in the direction of vector field V is determined by : dJ^{u- V)= [ [u'ttTj + Uttrj’ + {DV{0) +* DV{0))uttv]dxdr JQ J uttvV{0)n + J^{aijeki{(f>)da - f<P)V{Q)nds - f ivp<j>V{0) p\/ (l)V{0) +p(l)D)ds - [ a!j^(l)Tds + Ii{u,(l)) - l 2 {X,u,(j)), (44) JVo + ix + + where a' is a shape derivative of the function ag with respect to s. This derivative is defined by (26). \/p is a gradient of function p with respect to X. Moreover V(0) = V(0,X), (j)T and gt are tangent components of functions (j) and a, respectively, as well as D is given by (34)- DV(0) de- notes the Jacobian matrix of the matrix V(0) and div denotes divergence operator. Proof : Taking into account (22), (25) as well as formulae for transfor- mation of the gradient of the function defined on domain Vtg into the reference domain fi [16] and using the mapping (16)- (17) we can ex- press the cost functional (22) defined on domain ilg in the form of the functional J(j){u^) defined on domain O, determined by : - [ {DT Jq jU ttDT sffldetDT gdxdr + Shape Optimization for Dynamic Contact Problems with Friction 297 DTseki{DTs<i>l) - rDTs4>)^^WTsdx II detZ?Ts*Z)Tj || ds — \stDT! g(j)'p II detZ^T^^ZJTj || ds, (45) where u^ = UsoTsEF,u = uo£F and A = Aq € A. By (43) we have : dJ^{u; V) = limsup[J,^(u^) - J^{u)]/s. (46) t — >^0 Remark, it follows by standard arguments [3] that the pair G Qs X Ks^ s G [O,-??), -i? > 0, satisfying the system (19)-(20) is Lipschitz continuous with respect to the parameter s. Passing to the limit with 5 -> 0 in (46) as well as taking into account the formulae for derivatives of and deti^T^ with respect to the parameter s [16] and (26) we obtain (44). □ In order to eliminate the shape derivative (u', A') from (44) we intro- duce an adjoint state (r, q) e K 2 x L 2 defined as follows : with / rttCdxdr+ / (Jij{C)eki{(l) + r)dxdr + JQ Jq [ Ctriq - XKdxdr = 0 e K 2 , r(T,x) = 0, rt{T,x) == 0, / {nr + ^tT - utT)Sdxdr = 0 \/S e L 2 , K 2 = {c e Ki : (n = 0 on Aq}, (47) (48) (49) L 2 = {S e A : 6 = 0 on ^0 n 5o }. (50) Since 0 G M is a given element, then by the same arguments as used to show the existence of solution (u^X) e K x L to the system (19)-(20) we can show the existence of the solution (r, q) e K 2 x L 2 to the system (47), (48). From (44), (37), (38), (47), (48) we obtain : dJ(f){u; V) = Ii{u, 0 -f r) -f hiX.u, 0 + r) -h h{u,q - A). (51) 298 The necessary optimality condition has a standard form : Theorem 5.1 There exists a Lagrange multiplier /a E R such that for all vector fields V determined by (16), (17) the following condition holds dJ4u-,V) + nfv{0)nds>0, (52) where dJ^{u]V) is given by (51). Proof : It is given in [3, 4, 5, 16, 17]. 6. Conclusions In the paper the necessary optimality condition for the shape opti- mization problem for the dynamical contact problem was formulated. Preliminary numerical results can be found in [13] where the continuous optimization problem was discretized by piecewise linear and piecewise constant functions on each finite element. The discretized problem was numerically solved by an Augmented Lagrangian Algorithm combined with active set strategy and updating of the dual variables. References [1] R.A. Adams, Sobolev Spaces^ Academic Press, New York, 1975. [2] G.Duvaut and J.L. Lions, Les inequations en mecanique et en physique, Dunod, Paris, 1972. [3] J. Haslinger, Neittaanmaki P., Finite Element Approximation for Optimal Shape Design. Theory and Application., John Wiley &: Sons, 1988. [4] E.J. Hang, K.K Choi, V. Komkov, Design Sensitivity Analysis of Structural Systems, Academic Press, 1986. [5] I. Hlavacek, J. Haslinger, J. Necas, J.Lovisek, Solving of Variational Inequalities in Mechanics{m Russian), Mir, Moscow, 1986. [6] J. Jarusek and C. Eck, Dynamic Contact Problems with Small Coulomb Fric- tion for Viscoelastic Bodies. Existence of Solutions, Mathematical Models and Methods in Applied Sciences, 9, pp. 11 - 34, 1999. [7] J. Jarusek, Dynamical Contact Problem with Civen Friction for Viscoelastic Bodies, Czech. Math. Journal , 46 , pp. 475 - 487, 1996. [8] A. Klabring, J. Haslinger, On almost Constant Contact Stress Distributions by Shape Optimization, Structural Optimization, 5, pp. 213-216, 1993. [9] A. Myslihski, Mixed Variational Approach for Shape Optimization of Contact Problem with Prescribed Friction, in : Numerical Methods for Free Boundary Problems, P. Neittaanmaki ed.. International Series of Numerical Mathematics, Birkhauser, Basel, 99, pp. 286-296, 1991. Shape Optimization for Dynamic Contact Problems with Friction 299 [10] A. Myslinski, Shape Optimization of Contact Problems Using Mixed Vari- ational Formulation, Lecture Notes in Control and Information Sciences, Springer, Berlin, 160 , pp. 414 - 423, 1992. [11] A. Myslinski, Mixed Finite Element Approximation of a Shape Optimization Problem for Systems Described by Elliptic Variational Inequalities, Archives of Control Sciences, 3, No 3-4, pp. 243 - 257, 1994. [12] A. Myslinski, Shape Optimization for Dynamic Contact Problems, Discussiones Mathematicae, Differential Inclusions, Control and Optimization, 20, pp. 79 - 91, 2000. [13] A. Myslinski, Augmented Lagrangian Techniques for Shape Optimal Design of Dynamic Contact Problems, Preprint, System Research Institute, Warsaw, 2001 - to be published in Proceedings of WCSM04 Conference, 2001. [14] J. Necas, Les Methodes Directes en Theorie des Equations Elliptiques, Masson, Paris, 1967. [15] J. Sokolowski and J.P. Zolesio, Shape sensitivity analysis of contact problem with prescribed friction. Nonlinear Analysis, Theory, Methods and Applications , 12, pp. 1399 - 1411, 1988. [16] J. Sokolowski and J.P. Zolesio, Introduction to Shape Optimization. Shape Sen- sitivity Analysis. Springer, Berlin, 1992. [17] J. Telega, Variational Methods in Contact Problems of Mechanics(in Russian), Advances in Mechanics, 10, pp. 3-95, 1987. OPTIMAL SHAPE DESIGN USING DOMAIN TRANSFORMATIONS AND CONTINUOUS SENSITIVITY EQUATION METHODS Lisa Stanley Department of Mathematical Sciences Montana State University Bozeman, Montana * stanleyOmath.montana.edu Abstract In this paper, we consider two approaches to solving an optimization based design problem where “shape” is the design parameter. Both methods use domain transformations to compute gradients. However, they differ in that the second method is based on solving a transformed optimization problem completely in the computational domain. We il- lustrate the methods using a simple ID problem and discuss the benefits and drawbacks of each approach. Keywords: Continuous Sensitivity Equation Methods, Optimal Design 1. Introduction The focus of the paper is an optimal design problem where the de- sign parameter determines the shape of the domain of the constraint equation. The cost function is given in terms of an integral expression describing the L2 difference between some target function and the state variable. The constraint equation, or state equation, takes the form of an elliptic partial differential equation defined on a parameter dependent domain. Under the assumption that each point in the design space deter- mines a unique state variable through the solution of the state equation, we pose the unconstrained optimal design problem. *Work supported by the National Science Foundation under grant DMS-0072438 and the Air Force Office of Scientific Research under AASERT grant F49620-97- 1-0329. 301 302 Since the domain of the constraint equation changes with perturba- tions in the design, numerical solution of the optimal design problem is often hampered by burdensome grid generation requirements at each iteration of an optimization algorithm. One technique that can be used to avoid this problem is to transform the domain of the constraint equa- tion to one that is fixed and no longer depends on the shape param- eter. An equivalent transformed constraint equation is posed on this fixed, computational domain, see [4, 8], for example. In this paper, we present two approaches to the optimal design problem. Each approach uses the transformation technique mentioned above along with CSEMs (Continuous Sensitivity Equation Methods) in order to solve the op- timal design problem. The main difference between the two methods is that one solves the optimal design problem using the parameter de- pendent domain of the constraint equation while the second approach applies a mapping technique in order to transform both the cost func- tion and the constraint equation to a fixed computational space. This results in a transformed optimization problem. In each case, gradient based optimization is applied, and CSEMs are used to supply gradient information. One of the major topics of concern for using CSEMs with optimal design is the issue of consistent derivatives. Within the optimization literature, the assumption is usually made that the gradient information is the derivative (with respect to the design parameter) of the numerical approximation of the cost function. There is a great deal of concern that convergence and robustness are compromised if the derivative ap- proximations are computed using techniques which do not account for truncation and roundoff errors implicitly contained in the cost function. In [1,2], the notion of asymptotically consistent derivatives is introduced, and CSEMs, when coupled with a trust region method, are shown to be applicable within optimal design algorithms. More precise definitions are introduced in Section 5.1. We first pose an example optimal design prob- lem, and the computational approaches mentioned above are sketched out in the context of this example. Numerical results are shown, and we conclude with some general remarks concerning these approaches in Section 6. 2. A ID Optimal Design Problem Let Q — [l,+oo) denote the design space, and for q E let Q.q = (0, q). Consider the boundary value problem d^ ;g) =f(a;), X ^ ( 1 ) 303 Optimal Shape Design using Domain Transformations and CSEMs with homogeneous Dirichlet boundary conditions w(0) = 0, w(q') = 0. ( 2 ) The forcing function, f : (0, +oo) — >■ R, is the piecewise continuous function defined by f 0, 0 < aj < 1 \ —1, 1 < a: < + 00 . ( 3 ) For each q £ Q, (l)-(2) has a unique solution w(- ;q) G H^{0,q) H (0) ^)- Thus, we define a cost function F: Q IR by 1 F{q) ^ 2 Jo ’’’ ( 4 ) and we focus on the optimal design problem: min F{q). (5) qeQ Observe that the state equation, (1) - (2), is defined on the “physical” space and the cost function, F{')^ is defined over a fixed subset of this space. For this simple example, q can be interpreted as a “shape” parameter in the sense that it determines the length of the interval over which the state w(-; g) is defined. 2.1 Domain Transformations For large scale problems where the shape of the domain of the state equation is parameter dependent, grid generation often poses a major difficulty in the optimal design process. As mentioned earlier, one way to overcome this obstacle is to apply a domain transformation from the physical space to the fixed, computational space. For the model problem discussed in this paper, transforming is clearly very simple. We note that determining the domain transformation for any given two-dimensional or three-dimensional set can be much more complicated. Moreover, this calculation often requires the application of a numerical method. In order to focus on the issues related to sensitivity computation and the resulting gradient approximations, the application of an algebraic do- main mapping to the model problem is justified. Here we describe the transformation of the parameter dependent do- main [0, q] to the fixed computational domain, [0, 1]. Once this mapping is constructed, the transformed state equation is defined accordingly. For Of > 0, let = (0, a), and for each fixed g > 1, define the transforma- tion M(- ; g): by =(q = x. ( 6 ) 304 Note that the spatial variable on the fixed domain is and we use x to denote the spatial variable on The transformations given above are used to define the “transformed” functions. Let ^ G fill and ^ > 1, and for any function u G Lfo(fi^), define the transformed function u G i?o(fii) as follows u(C; 9 ) = = u(a:;g). (7) It can be shown that for a given value of if w(- ; is a solution to the boundary value problem given in (l)-(2), then the corresponding function w(- ; g) is a solution to the boundary value problem ^ 6 ( 0 , 1 ) ( 8 ) with boundary conditions w(0) = 0, w(l) = 0. (9) The forcing function f (^; g) is obtained by using the mapping M and the relation = f(M(^,g),g) == f(rr) and has the form 0, 0 < ^ < i ( 10 ) Henceforth, the boundary value problem (8)- (9) is referred to as the transformed state equation^ and it is used in each of the computational approaches described in the following sections. 3. Computational Approach 1 In this section, we describe one approach for solving the optimal design problem in (5). This approach can be described as a “differentiate-then- map” scheme. Observe that the gradient of the cost function has the form d V-P(9) = [w(a;; q) + sin (nx)] s{x-, q)dx, (11) where the sensitivity is defined as follows s{- ;q) = iq)- ( 12 ) In order to compute the sensitivity, we use the CSEM approach. We derive a sensitivity equation^ an equation for which the sensitivity in Optimal Shape Design using Domain Transformations and CSEMs 305 (12) is a solution. Formally speaking, this equation is derived by implicit “differentiation” of the state equation and boundary conditions in (1)- (2). For the model problem considered here, it can be shown that the sensitivity equation and associated boundary conditions are given by d? -^s(a;) =0, X €9.q (13) with boundary conditions s(0) = 0, s(^) = (14) Observe that the normal derivative of w appears in the right bound- ary condition in (14). This is typical for shape sensitivity problems, and these boundary conditions are tricky to derive correctly for more complicated problems. Gradient based optimization requires that we numerically approxi- mate both the cost function and its gradient for a given value of the parameter q. Aside from the implementation of a quadrature rule, each iteration of the optimization algorithm involves a numerical calculation of both the state and the sensitivity for a given design parameter value. The following section describes the numerical scheme employed for these computations. 3.1 State and Sensitivity Calculations Here we illustrate the use of the mapping technique discussed in Sec- tion 2.1. Both a transformed state equation and a transformed sensi- tivity equation are constructed on the computational domain fii. The derivation of the transformed state equation is presented in detail in Sec- tion 2.1 and is given explicitly in equations (8)-(9). In a similar fashion, we define the transformed sensitivity =s{M{^,qy,q) =s{x;q), (15) and the transformed sensitivity equation is constructed. This boundary value problem has the form -s"(0 = o, ee(0,i), (16) with boundary conditions s(0) = 0. s(l) = - (1) . 1 ^^ = - (J) ■ |w(l). (17) 306 Once the transformed equations are constructed, a discretization is ap- plied. For the numerical approximations presented here, we apply a piecewise linear finite element method to (8)-(9) and to (16)-(17). For the sake of brevity, the details of the implementation are omitted; how- ever, the interested reader can refer to [3] for a more complete exposi- tion. Once the numerical calculations are performed, the recovery of the state and sensitivity approximations (defined on the physical space Qq) is achieved through the relations in (7) and (15). 4. Computational Approach 2 In this section, we present an approach to the optimal design problem which is similar to an idea considered in [5]. Like the previous scheme, both domain transformations and CSEMs are used in this strategy. The fundamental difference between the following approach and the one pre- sented in Section 3 is the order in which these techniques are applied. In this section, the domain transformation is applied to the cost function as well as the state equation. First, we construct a transformed optimal design problem which is equivalent to the original in (4) and which uses information from the transformed state equation. A CSEM is then used to supply gradient information for the transformed cost function. Before presenting the transformed optimal design problem, we remark that under the mapping M defined in (6), the following equality holds / g{x)dx-=- g{M{^,q)) — dC = q g{M{^,q))d^, Jo JM(0;q) CL^ Jo where g is any C^-function defined on fii. Along with the previous equality, the definitions in (6) and (7) give rise to the transformed cost function ^ F{(l) = [w(^,^) +sin(97rO]^c?6 (18) Here w(^,g) is the solution to the transformed state equation given by the boundary value problem (8)- (9) for each q e Q. Hence, the transformed optimal design problem is given by mm F{q), (19) qeQ where the design space, Q, remains the same as in Section 2. Observe that a factor of q appears in the expression (18). Recall that the map- ping, M, depends explicitly on the parameter q. Hence, the absolute value of the derivative of the mapping (and more generally, the determi- nant of the Jacobian matrix) is also parameter dependent, and this term Optimal Shape Design using Domain Transformations and CSEMs 307 appears explicitly in (18). For this particular example, the derivative is very simple, but we remark that the issue of parameter dependent deriva- tives is more complicated for two-dimensional and three-dimensional do- mains. A two-dimensional illustration can be found on pages 365-366 of [4]. Once the transformed optimal design problem is constructed, we pro- ceed in much the same manner as previously discussed. Using Leibnitz’ formula, the gradient of the transformed cost function has the following form ^F{q) = \ If q) + sin {qnOf + (|) [w(^;?)]^ (-^) + (l) 2[w(^; q) + sin [qn^] cos (^tt^) which can be simplified to the expression = q ['’ [w(^, q) + sin (^O] [p(^, 9 ) + cos {qTr^)]d^ Jo + T(2F{q)-[^{^-,q)Yy ( 20 ) In the equation above, the notation p(^; q) is used to denote the sensi- tivity of the transformed state] that is, we define p(^;9) = (21) It is important to note that the sensitivity of the transformed state, p(^; g), is related to, but not the same function as, the transformed sen- sitivity^ s{(]q). The notation used above refiects this important distinc- tion. The following section describes the techniques used for obtaining numerical approximations for the transformed state and the sensitivity, p(^;9)- 4.1 state and Sensitivity Calculations For this approach, the optimization algorithm requires that we com- pute a numerical approximation to the transformed state, w(-;g^), and the sensitivity of the transformed state, p(-;^)- As in the previous sec- tion, a piecewise linear finite element method is used to approximate the transformed state w(^,g). In order to calculate an approximation for p(-; g), we derive a sensitiv- ity equation for which p(-; is a solution. In particular, the transformed 308 state equation in (8)-(9) is “differentiated”. Although the parameter ap- pears explicitly in the right hand side of equation (8) and determines a point of discontinuity of the forcing function, one can still derive the sen- sitivity equation in a mathematically precise fashion. A rigorous math- ematical construction is presented in [6] and references therein. Here we simply state that the sensitivity, p(-; satisfies the second order, linear elliptic boundary value problem given by - (5i(C), (22) p(0) = 0, p(l) = 0. (23) Here 5i(£) is the Dirac delta function with mass at ^ i. Since the domain does not depend on g, the boundary conditions are clear. Observe that the sensitivity equation is decoupled from the transformed state equation, but we caution the reader that this decoupling is merely a phenomena of the linearity of the transformed state equation. We also note that the linear elliptic problem (22)- (23) does not have a solution in i^o (^ 1)7 the system must be interpreted in the weak sense, that is, in integral form. For the results presented in this paper, a piecewise linear finite element method is used to approximate both w(- ;g) and p(- ;g). For the sake of brevity, the details of the finite element implementations are omitted, and we proceed directly to the computational results. 5. Computational Results In this section, numerical results are presented for two cases. The first is a comparison using a four-point Gauss quadrature rule for both the cost function approximations and the gradient approximations. From the second we make an interesting anecdotal comment concerning the importance of choosing a quadrature rule with the appropriate degree of accuracy. Recall that each computational approach involves discretizing and nu- merically computing an approximation to the transformed state equation (8)- (9). The distinction between the calculations is the fact that Com- putational Approach 1 recovers an approximation to the original state through the mapping, M, and implements the quadrature rule in the physical space while Computational Approach 2 applies the quadrature rule in the computational space. Since M is a straightforward algebraic manipulation which can be “hard- wired” , there is no loss in accuracy for the state approximation during the recovery process of Computational Algorithm 1. We briefly note that a four-point quadrature rule is suf- ficient to obtain an extremely accurate approximation to the true cost Optimal Shape Design using Domain Transformations and CSEMs 309 function in each case. Figures 1 and 2 show the respective cost function approximations plotted against the graph of the true cost function. The step in the parameter is Ag = 0.1 over the parameter range given, and the transformed state approximations are obtained using N = 3 grid points for these graphs. We also note that the error (measured in the vector norm, || • ||oo) in the cost function approximations is on the order of 10“^ for each of the computational algorithms. Now we move to the more interesting issue of gradient approximations. Figure 1. True Cost Function and Approximations for Computational Approach 1 5.1 Gradient Approximations This section briefly addresses the issue of gradient approximations for each computational approach. We preface the numerical results with two deflnitions regarding gradient approximations. The following discussion and deflnitions are taken from [2, 1]. We remark that the notation in [1] is slightly diff’erent because they explore the issue of applying different approximation schemes to obtain the state and sensitivity approxima- tions. For our results, the discretization applied to compute the state approximations, and subsequently the cost function approximations, is exactly the same as that applied to compute the sensitivity approxima- tions and the subsequent gradient approximations. 310 Figure 2. True Cost Function and Approximations for Computational Approach 2 In the following discussion, we also refer to the discretization as an iV-discretization in the sense that N refers to the number of grid points in the finite element mesh. To be more precise, we should include nota- tion identifying the quadrature rule here as well. However, since we are comparing approximations using a four point quadrature rule in both cases, we choose to simplify the notation as much as possible. Further- more, J denotes an arbitrary cost function which depends on the design parameter q. A sensitivity approach is said to produce consistent deriva- tives with respect to the state and sensitivity approximations using the A’-discretization if = [VJ(?)]'^ \/q€Q. (24) This definition states that the gradient of the approximate cost function is the same as the approximation of the true gradient. A more relaxed definition stipulates that the difference between the two gradient ap- proximations should approach 0 with grid refinement. In particular, a sensitivity approach is said to produce asymptotically consistent deriva- tives with respect to the state and sensitivity approximations using the A-discretization if VJ^{q) - [VJCg)]"^ — > 0 ^ q ^ (25) Optimal Shape Design using Domain Transformations and CSEMs 311 Figure 3. True Gradient and Approximations for Computational Approach 1 as A/ -A oo, that is, as the grid is refined. The computational approaches presented in Sections 3 and 4 fall into the category of an approxima- tion of the true gradient. Hence, Computational Approach 1 produces [VF(^)]^, and Computational Approach 2 yields |^VF(g)j . In the following figures, we present a sample of the gradient approx- imations obtained using each computational approach. The gradient approximations are compared with both a centered difference gradient approximation (solid curve with o’s, representing and VF^, re- spectively) and the true gradient (solid curve). In Figure 3, the gradient approximations generated using Computational Approach 1 converge to the finite difference gradient (and to the true gradient) with mesh re- finement. Hence, Computational Approach 1 yields asymptotically con- sistent derivatives. Figure 4 indicates that Computational Approach 2 produces consistent derivatives. 5.2 Anecdotal Observation In the case where a three-point Gauss quadrature rule is used, the quadrature rule is insufficient for convergence of the cost function ap- proximations as the mesh is refined. That is, if we use three quadrature points for the integral approximations, then the cost function approxima- 312 Figure 4- True Gradient and Approximations for Computational Approach 2 1 1.5 2 2.5 3 3.5 4 4.5 q tions for each approach are given in Figures 5 and 6. We have included the graphs for W = 3 grid points and W = 33 grid points to show that the accuracy of the approximations does not improve with mesh refine- ment, and the error (using || •Hoc) in these approximations is on the order of 10~^. All of the approximations generated using values of N between 3 and 33 exhibit exactly the same behavior. The gradient approxima- tions for this case are somewhat interesting. In particular, Figures 7 and 8 suggest that Computational Algorithm 1 produces asymptotically consistent gradients while Computational Approach 2 produces incon- sistent or “non-consistent” gradient approximations. This behavior may be a result of the fact that the we use the transformed cost function ap- proximation during the gradient calculation on the computational space, recall the expression in (20). The gradient expression for Computational Algorithm 1, in (11), does not explicitly involve the cost function, F{q). 6. Computational Issues We conclude with some observations gathered during the course of the research. Since the domain transformations depend explicitly on the pa- rameter, spatial derivatives are also parameter dependent and appear ex- Optimal Shape Design using Domain Transformations and CSEMs 313 Figure 5. True Cost Function and Approximations for Computational Approach 1 using three-point quadrature rule Figure 6. True Cost Function and Approximations for Computational Approach 2 using three-point quadrature rule q 314 Figure 7. True gradient, finite difference gradient and approximations for Compu- tational Approach 1 using three-point quadrature rule I I \ I 1 i I I 1 1.5 2 2.5 3 3.5 4 4.5 q Figure 8. True gradient, finite difference gradient and approximations for Compu- tational Approach 2 using three-point quadrature rule q Optimal Shape Design using Domain Transformations and CSEMs 315 plicitly in both the transformed cost function and the transformed state equation. As a result, the derivation of a gradient expression is tedious and involves several terms including the transformed cost function, F{q). We approximate F{q) at each iteration of the optimization algorithm, and the approximation is reused in the gradient approximation routine. However, this may require good judgement for the quadrature rules as shown in Section 5.2. The results given here indicate that CSEMs can yield accurate, consistent gradients provided that the numerical schemes are chosen with care. Using the domain transformations is advantageous for the rigorous mathematical derivation of sensitivity equations. How- ever, the issue of differentiability of the mappings becomes an important question for both gradient derivation and sensitivity analysis in Compu- tational Approach 2 for problems with complicated geometries. In Computational Approach 1, the derivation of the sensitivity equa- tion is somewhat ad hoc; however, differentiation of the domain mapping is not required. One must also be willing to accept the asymptotically consistent derivatives that this method produces. For many problems, we observe that the gradient approximations for this approach tend to accurately pinpoint the location of the root of the gradient even on coarse meshes. Finally, the results given in Section 5.2 indicate that for certain problems, CSEMs can produce asymptotically consistent gradients even if the cost function approximations are inaccurate. Each computational algorithm exhibits specific characteristics that can be view as advan- tageous. Further research to determine which computational approach best fits a given problem is needed. Acknowledgments The author wishes to acknowledge Dr. John Burns and Dr. Eugene Cliff for raising questions which led to this research and Dr. Jeff Borggaard for many helpful conversations concerning optimization. References [1] J.T. Borggaard and J.A. Burns. (1997). Asymptotically Consistent Gradients in Optimal Design. Multidisciplinary design optimization, State of the Art, 303- 314, 1997. [2] J.T. Borggaard (1994). The Sensitivity Equation Method for Optimal Design. PhD thesis, Virginia Polytechnic Institute and State University, Blacksburg, Virginia. Mathematics PhD. [3] J.A. Burns and L. G. Stanley. (2001). A Note on the Use of Transformations in Sensitivity Computations for Elliptic Systems. Mathematical and Computer Modelling, 33, 101-114. 316 [4] K.A. Hoffman and S.T. Chiang. (1993). Computational Fluid Dynamics for Engineers. Engineering Education System. [5] M. Laumen. (2000). Newton’s Method for a Class of Optimal Shape Design Problems. SIAM Journal on Optimization^ 10(2), 503-533. [6] L.G. Stanley. (2001) Sensitivity Equation Methods for Parameter Dependent Elliptic Equations. Numerical Functional Analysis and Optimization^ 10(5&:6), 721-748. [7] L.G. Stanley. (2001). Shape Sensitivities for Optimal Design: A Case Study on the Use of Continuous Sensitivity Equation Methods. David Gao, Ray Ogden and Georgios Stavroulakis, editors. Nonsmooth / Nonconvex Mechanics: Model- ing, Analysis and Numerical Methods, pages 369-384. Kluwer Academic Publish- ers, Nonconvex Optimization and Its Applications Series (50), The Netherlands, 2001. [8] J.F. Thompson and Z.U. Warsi and C.W. Mastin. (1985). Numerical Grid Gen- eration Foundations and Applications. Elsevier Publishing Company, 1985. ADJOINT CALCULATION USING TIME- MINIMAL PROGRAM REVERSALS FOR MULTI-PROCESSOR MACHINES Andrea Walther Institute of Scientific Computing Technical University Dresden awalther@math.tu-dresden.de Uwe Lehmann Center for High Performance Computing Technical University Dresden lehmann@zhr.tu-dresden.de Abstract For computational purposes such as debugging, derivative computations using the reverse mode of automatic differentiation, or optimal control by Newton’s method, one may need to reverse the execution of a pro- gram. The simplest option is to record a complete execution log and then to read it backwards. As a result, massive amounts of storage axe normally required. This paper proposes a new approach to reversing program executions. The presented technique runs the forward simu- lation and the reversal process at the same speed. For that purpose, one only employs a fixed and usually small amount of memory pads called checkpoints to store intermediate states and a certain number of processors. The execution log is generated piecewise by restarting the evaluation repeatedly and concurrently from suitably placed check- points. The paper illustrates the principle structure of time-minimal parallel reversal schedules and quotes the required resources. Further- more, some specific aspects of adjoint calculations are discussed. Initial results for the steering of a Formula 1 car are shown. KeyAvords: Adjoint calculation. Checkpointing, Parallel computing 1. Introduction and Notation For many industrial applications, rather complex interactions between various components have been successfully simulated with computer 317 318 models. This is true for several production processes, e.g. steel ma- nufacturing with regards to various product properties, for example stress distribution. However, the simulation stage can frequently not be followed by an optimization stage, which would be very desirable. This situation is very often caused by the lack or inaccuracy of deriva- tives, which are needed in optimization algorithms. Hence, enabling the transition from simulation to optimization represents a challenging research task. The technique of algorithmic or automatic differentiation (AD), which is not yet well enough known, offers an opportunity to provide the required derivative information [5]. Therefore, AD can contribute to overcoming the step from pure simulation and hence ‘‘trial and error” - improvements to an exact analysis and systematic derivative-based op- timization. The key idea of algorithmic differentiation is the systematic applica- tion of the chain rule. The mathematical specification of many applica- tions involves nonlinear vector functions F:R^ x^F{x), that are typically defined and evaluated by computer programs. This computation can be decomposed into a (normally large) number of very simple operations, e.g. additions, multiplications, and trigonometric or exponential function evaluations. The derivatives of these elementary operations can be easily calculated with respect to their arguments. A systematic application of the chain rule yields the derivatives of a hier- archy of intermediate values. Depending on the starting point of this methodology, either at the beginning or at the end of the sequence of operations considered, one distinguishes between the forward mode and the reverse mode of AD. The reverse mode of algorithmic differentiation is a discrete analog of the adjoint method known from the calculus of variations. The gradient of a scalar- valued function is yielded by the reverse mode in its basic form for no more than five times the operations count of evaluating the function itself. This bound is completely independent of the number of independent variables. More generally, this mode al- lows the computation of Jacobians for at most five times the number of dependents times the effort of evaluating the underlying vector func- tion. However, the spatial complexity of the basic reverse mode, i.e. its memory requirement, is proportional to the temporal complexity of the evaluation of the function itself. This behaviour is caused by the fact that one has to record a complete execution log onto a data structure called tape and subsequently read this tape backward. For each arith- Adjoint Calculation using Time-minimal Program Reversals 319 metic operation, the execution log contains a code and the addresses of the arguments as well as the computed value. It follows that the practi- cal exploitation of the advantageous temporal complexity bound for the reverse mode is severely limited by the amount of memory required. The reversal of a given function F is already being extensively used to calculate hand-coded adjoints. In particular, there are several con- tributions on weather data assimilation (e.g. [11]). Here, the desired gradients can be obtained with a low temporal complexity by integrat- ing the linear co-state equation backwards along the trajectory of the original simulation. This well-known technique is closely related to the reverse mode of AD [3]. Moreover, debugging and interactive control may require the reconstruction of previous states by some form of run- ning the program backwards that evaluates F. The need for some kind of logging arises whenever the process described by F is not invertible or ill conditioned. In these cases one cannot simply apply an inverse process to evaluate the inverse mapping F~^. Consequently, the rever- sal of a program execution within a reasonable memory requirement has received some (but only perfunctory) attention in the computer science literature (see e.g. [12]). This paper presents a new approach to reversing the calculation of F. For that reason, in the remainder of this section, the structure of the function F is described in detail. The reversal technique proposed in this article only employs a fixed and usually small amount of memory pads to store intermediate states and a certain number of processors for reversing F in minimal time. The corresponding time-minimal parallel reversal schedules are introduced in Section 2. The simulation of a For- mula 1 car is considered in Section 3. The underlying ODE system is introduced. Then two diflFerent ways to calculate adjoints are discussed. Subsequently, the initial numerical results are presented. Finally, some conclusions are drawn in Section 4. Throughout it is assumed that the evaluation of F comprises the evaluation of subfunctions 1 < « < /, called physical steps that act on state x'^~^ to calculate the subsequent intermediate state x'^ for I <i < I depending on a control Hence, one has = Fi{x^-\u^-^) . Therefore, F can be thought of as a discrete evolution. The intermediate states of the evolution F represented by the counter i should be thought of as vectors of large dimensions. The physical steps Fi describe mathe- matical mappings that in general cannot be reversed at a reasonable cost even for given Hence, it is impossible to simply apply the inverses F^^ in order to run the program backwards from state I to state 0. It 320 will also be assumed that due to their size, only a limited number of intermediate states can be kept in memory. Furthermore, it is supposed that for each i G there exist functions Fi that cause the recording of intermediate values generated during the evaluation of Fi onto the tape and corresponding functions Fi that perform the reversal of the ith physical step using this tape. More precisely, one has the reverse steps where F/ denotes the Jacobian of Fi with respect to x'^~^ and u^~^. The calculation of adjoints using the basic approach is depicted in Fig- ure 1. Applying a checkpointing technique, the execution log is gen- Figure 1. Naive approach to calculate Adjoints erated piecewise by restarting the evaluation repeatedly from suitably placed checkpoints, according to requests by the reversal process. Here, the checkpoints can be thought of as pointers to nodes representing in- termediate states i. Using a checkpointing strategy on a uni-processor machine, the calculation of F can be reversed, even in such cases where the basic reverse mode fails due to excessive memory requirement (see e.g. [7, 6]). However, the runtime for the reversal process increases com- pared to the naive approach. For multi-processor machines, this paper presents a checkpointing technique with concurrent recalculations that reverses the program execution in minimal wall-clock time. 2. Time-minimal Parallel Reversal Schedules To derive an optimal reversal of the evaluation procedure JP, one has to take into account four kinds of parameters, namely: Adjoint Calculation using Time-minimal Program Reversals 321 1. ) the number I of physical steps to be reversed; 2. ) the number p of processors that are available; 3. ) the number c of checkpoints that can be accommodated; and 4. ) the step costs: r = TIME{Fi)^ f = TIME{Fi)^ f = TIME(Fi). Well known reversal schedules for serial machines, i.e. p = 1, and con- stant step costs r allow an enormous reduction of the memory required to reverse a given evolution F in comparison with the basic approach (see e.g. [7, 6]). Even if the step costs = TIME{Fi) are not constant it is possible to compute optimal serial reversal schedules [13]. However, one has to pay for the improvements in the form of a greater temporal complexity because of repeated forward integrations. If no increase in the time needed to reverse F is acceptable, the use of a sufficiently large number of additional processors provides the pos- sibility to reverse the evolutionary system F with drastically reduced spatial complexity and still minimal temporal complexity. Correspond- ing parallel reversal schedules that are optimal for given numbers I of physical steps, p > 1 processors, c checkpoints, and constant step costs were presented for the first time in [13]. For that purpose, it is sup- posed that r =: 1, f > 1, and r > 1, with f,r G N. Furthermore, it is always assumed that the memory requirement for storing the interme- diate states is the same for all i. Otherwise, it is not clear whether and how parallel reversal schedules can be constructed and optimized. The techniques developed in [13] can certainly not be applied. In practical applications, nonuniform state sizes might arise, for example as result of adaptive grid refinements, or function evaluations that do not conform naturally to our notion of an evolutionary system on a state space of fixed dimension. Finding a time-minimal parallel reversal schedule can be interpreted as a very special kind of scheduling problem. The general problem class is known to be NP-hard (e.g. [4]). Nevertheless, it is possible to specify suitable time-minimal parallel reversal schedules for a arbitrary number I of physical steps because the reversal of a program execution has a very special structure. For the development of these time-minimal and resource-optimal parallel reversal schedules, first an exhaustive search algorithm was written. The input parameters were the number p of available processors and the number c of available checkpoints with both r and r set to 1. The program then computed a schedule that reverses the maximal number of physical steps /(p, c) in minimal time using no more than the available resources p and c for p + c < 10. Here, minimal time means the wall clock equivalent to the basic approach of recording 322 all needed intermediate results. Examining the corresponding parallel reversal schedules, one obtained that for p > c, only the resource number g = p + c has an influence on /(p, c) = I g. Therefore, the development of time-minimal parallel reversal schedules, that are also resource-optimal, is focused on a given resource number g under the tacit assumption p > c. The results obtained for p < 10 provided sufficient insight to deduce the general structure of time-minimal parallel reversal schedules for arbitrary combinations of f > 1, f > 1, and g > 10. Neglecting communication cost, the following recurrence is established in [13]: Theorem: Given the number of available resources g = p-\-c with p > c and the temporal complexities f E N and f E N o/ the recording steps Fi and the reverse steps Fi, then the maximal length of an evolution that can be reverted in parallel without interruption is given by , _ j Q if q<2 + t/t . , ^ \ Ig-i + T lg -2 - T + l else. In order to prove this result, first an upper bound on the number of physical steps that can be reversed with a given number g of processors and checkpoints was established. Subsequently, corresponding rever- sal schedules that attain this upper bound were constructed recursively. For this purpose, the resource profiles of the constructed parallel reversal schedules were analyzed in detail. In addition to the recursive construc- tion of the desired time- minimal reversal schedules, the resource profiles yield an upper bound for the number p of processors needed during the reversal process. To be more precise, for reversing Ig physical steps, one needs no more than Pu £±i 2 g±l 2 + T f— 1 2 f if r > f else processors [13]. Hence, roughly half of the resources have to be proces- sors. This fact offers the opportunity to assign one checkpoint to each processor. A time-minimal reversal schedule for / = 55 is depicted in Figure 2. Here, vertical bars represent checkpoints and slanted bars represent run- ning processes. The shading indicates the physical steps Fi, the record- ing steps Fi and the reverse steps Fi to be performed. Based on the recurrence (1), it is possible to describe the behaviour of Ig more precisely. For f = f = 1, one finds that the formula for Ig is equal to the Fibonacci- number fg-\. Moreover, for other combinations of f,f E N, the recurrence (1) produces generalized Fibonacci-numbers Adjoint Calculation using Time-minimal Program Reversals 323 0 5 10 15 20 25 30 35 40 45 50 55 Figure 2. Time-minimal Parallel Reversal Schedule for / = 21 and r = r = 1 324 (see e.g. [9]). More specifically, one finds that 1 + vl + dr -(l + ^/^T4f) ^-1 in the sense that the ratio between the two sides tends to 1 as ^ tends to infinity. In the important case f = 1 even their absolute difference tends to zero. Thus, I = Ig grows exponentially as a function oi g ^ 2p and conversely p ^ c grows logarithmically as a function of /. In order to illustrate the growth of assume 16 processors and 16 checkpoints are available. These resources suffice to reverse an evolution of / = 2 178 309 physical steps when r = f = 1 and even more steps if f = 1 and f > 1. For f = 1, i.e., if the forward simulation and the reversal of the time steps can be performed at the same speed, the implementation of this theory was done using the distributed memory programming model [10]. It is therefore possible to run the parallel reversal schedules framework on most parallel computers independent of their actual memory struc- ture. To achieve a flexible implementation, the MPI routines for the communication are used. The parallel reversal schedules are worked off in a process-oriented manner instead of a checkpoint-oriented manner (see [10] for details). This yield the optimal resource requirements of Theorem 1. In order to apply the parallel reversal schedules framework, one has to provide interfaces and define the main data structures for computing the adjoint. The data structures required are the checkpoints, the traces or tapes, as a result of the recording step and the adjoint values. The structure and complexity of this data is independent of the framework since the framework only calls routines such as ■ forward (..) for the evaluation of one physical step ■ record ing(..) for the evaluation of one recording step ■ reverse(..) for the evaluation of one reverse step F*, provided by the user. These functions are equivalent to the functions used for a sequential calculation of the adjoint. The index i is an ar- gument of each of the modules. The function recording(..) generates the trace or tape. The function reverse(..) obtains the trace of the last recording step and the adjoint computed so far as arguments. Further- more, if i = the function reverse(..) may initialize the adjoints. Additionally, the user must code communication modules, for example sendCheckpoint(..) and receiveCheckpoint(..). All user-defined routines have to be implemented applying MPI routines. The required process Adjoint Calculation using Time-minimal Program Reversals 325 identifications and message tags are arguments of routines provided by the parallel reversal schedules framework. 3. Model Problem: Steering a Formula 1 Car In order to test the implementation of parallel reversal schedules, the simulation of an automobile is considered. The aim is to minimize the time needed to travel along a specific road. A simplified model of a Formula 1 racing car [1] is employed. It is given by the ODE system: xi X2 ^3 X4 ±5 Xq X7 = X2 ^ (■P’m (x,U2) + Fjj2(x,U2))lf - (F^3 (x,U 2) + F^^(x,U2))lr I Fr]l (^5 ^2) ^ T }2 (^1 ^2) “ 1 “ (^5 ^2) “i” -^774 (^5 ^2)) = M = X4 sin( 3 ::i) + xs cos(a;i) == X4^ cos(rri) — xs sin(o;i) = ui. Hence, a go-kart model with rigid suspension and a body rolling about a fixed axis is considered. There are seven state variables representing the yaw angle and rate {x\^ X 2 )^ the lateral and longitudinal velocity (rrs, 0:4), global position (0:5, xe)^ and the vehicle steer angle (xj) as shown in Figure 3. The control variables are ui denoting the front steer rate and U 2 denoting the longitudinal force as input. The lateral and longitudinal vehicle forces Frj and are computed using the state and the control variables as well as the tire forces given by a tire model described in [2]. The force Fa represents the aerodynamic drag depending on the longi- tudinal velocity. All other values are fixed car parameters such as mass M and length of the car given by If and In order to judge the quality of the driven line, the cost functional J{si)=f Scf{x,s){l+ g{x,s))ds (2) JO is used. The scaling factor Scf{x^ s) changes the original time integration within the cost function to distance integration. Therefore, an integra- tion over the arc length is performed. This variable change has to be done because the end time ti of the time integration is the value one actually wants to minimise. Hence, ti is unknown. The computation of the scaling factor Scf{x^ s) is described in [1]. The function g{x, s) mea- sures whether or not the car is still on the road. The road is defined by 326 Figure 8. Model of Formula 1 Car. the road centre line and road width. In the example presented here, the road width is constant a 2.5 m along the whole integration path. The function g{x^s) returns zero as long as the car drives within the road boundaries. If the car leaves the road then g{x^ 5 ) returns the distance from the car to the road boundary squared. 3.1. The Forward Integration For the numerical results presented here, a discretization has to be applied. Therefore, an appropriate initial vector x^ and the starting position = 0 were chosen. The route is divided equidistantly with a step size of h = 10 cm. The well known four-stage Runge-Kutta scheme ki = ^2 = f{x'^~^ + hki/2,u{s'^~^+h/2)) ks = f{x^~^ -h hk2/2, + /i/2)) (3) /j4 = f{x'^~^ + hkz^u{s^~^ + h)) x^ — x^ ^ + h{k\ + 2k2 “h 2/^3 + k/^jO serves as physical step for i = 1, . . . , 1000. The calculations of a physical step Fi form the forward (..)-routine needed by the time-minimal parallel reversal schedules. As mentioned above, in addition to this, one has to provide two further routines. Adjoint Calculation using Time-minimal Program Reversals 327 namely record ing(..) and reverse(..). The content of these two modules is described in the next subsection. 3.2. Calculating Adjoints There are two basic alternatives for calculating the adjoints of a given model. Firstly, one may form the adjoint of the continuous model equa- tion and discretize the continuous adjoint equation. Secondly, one may use automatic differentiation (AD), or hand-coding, to adjoin the dis- crete evaluation procedure of the model. Both ways do not commute in general (see e.g. [8]). Therefore, one has to be careful when decid- ing how to calculate the desired adjoints. For the computations shown below, the second option was applied, namely the adjoining of the dis- cretized equation (3). Application of AD in reverse mode amounts to the following adjoint calculation Fi (see e.g. [5]): tt3 = hx^ jZ -F /i&4 U2 = hxj jZ + /i6s/2 a\ — hx^ j?i + /i 62/2 uj — k- - —k- 64 = tt4A;4 63 = asks 62 = a2^2 b\ = a\ki — a\ki dJ dx'^~^ + + 61 + 62 + f>3 + 64, 1 < j < 4 ( 4 ) for i = where the functions kj^ 1 < j < 4, are defined as in (3). Here, iP denotes the adjoint of the control u a>t s^. Note that the integration of the adjoint scheme (4) has to be performed in reverse order starting at i = /. One uses x\ = dJ/dx\ 1 < i <7 and u\ = 0^ i = 1^2 as initial values because of the influence on the cost functional (2). After the complete adjoint calculation, each value denotes the sensitivity of the cost functional J with respect to the value ui. Now the return value of the routine reverse(..) is clear. It has to contain the computations needed to perform an adjoint step Fi according to (4). However, there are two ways to implement the interface between the modules recording(..) and reverse(..). One can either store the stages 1 <i <4, during the evaluation of the recording step Fi. Then the corresponding reverse step Fi comprises all calculations shown in (4), i.e. also the computation of the Jacobians kj^ 1 < j < 4. As an alternative, one can compute the Jacobians fej, 1 < j < 4 in the recording step Fi and store this information on the tape. Then the appropriate reverse 328 step Fi only has to evaluate the last three statements of Equation (4). The runtimes represented here are based on the second approach in order to achieve f = \. As a result, f equals 5. This implementation has the advantage that the value of f and hence the wall clock time are reduced at the expense of f. This can be seen for example in Figure 2, where an increase of f would result in an bigger slope of the bar describing the adjoint or reverse computations. As mentioned above, one has to be careful about the adjoint calcu- lation because of the lack of commutativity between adjoining and dis- cretizing in general. Therefore, it is important to note that the Runge- Kutta scheme (3) belongs to a class of discretizations, for which both possibilities of adjoint calculation coincide, giving the same result [8]. 3.3. Numerical Results To test the parallel reversal schedule framework, one forward inte- gration of the car model shown in Figure 4 and one adjoint calculation were performed. As previously mentioned, the integration distance was 100 m and the step size 10 cm. Hence, there are 1000 forward steps Fi. The Figure 5(a) shows the growth of the cost functional for which longitudinal position Figure Position of Formula 1 Car. we computed the sensitivities of the control variables u\ (Figure 5(b)) and u\ (Figure 5(c)). However, the resource requirements are of primary interest. One integration step in the example is relatively small in terms of computing time. In order to achieve reasonable timings 18 integration steps form one physical step of the parallel reversal schedule. The re- maining 10 integration steps were spread uniformly. Hence, one obtains 55 physical steps. Therefore, five processors were needed for the corre- sponding time- minimal parallel reversal schedule for f = f == 1. This reversal schedule is with small modifications also nearly optimal for the Adjoint Calculation using Time-minimal Program Reversals 329 0 20 40 60 80 100 s (a) Cost Functional J{s). (b) Adjoint of steering rate u\. (c) Adjoint of longitudinal force U 2 - Figure 5. Cost Functional and Adjoint of Control Variables. considered combination r = 5 and r = 1. A sixth processor (master) was used to organise the program run. naive approach parallel checkpointing double variables needed 266010 5092 memory required in kByte 2128.1 40.7 in % 100.0 1.9 Table 1. Memory Requirement The main advantage of the parallel reversal schedules is the enormous reduction in memory requirement as illustrate in Table 1. It shows that for this example, less than a fiftieth of the original memory requirement 330 is needed, i.e., less than 2%. On the other hand, only six times the original computing power, i.e., processors, is used. The theoretical runtime is also confirmed by the example as can be seen in Table 2. Due to the slower memory interface on a Cray T3E, the usage of less memory in parallel causes an enormous decrease in runtime. On the other hand the problem is too small and the SGI Origin 3800 too fast to show this effect. Nevertheless, one obtains that the assumption of negligible communication cost is reasonable. This is caused by the fact that the processors have the duration of one full physical step to send and receive a checkpoint because the checkpoint is not needed earlier. Only if the send and receive of one checkpoint needs more time than one physical step the communication cost becomes critical. naive approach parallel checkpointing T3E in sec. 20.27 18.91 in% 100.0 93.3 Origin 3800 in sec. 6.71 6.04 in % 100.0 90.0 Table 2. Runtime results 4. Conclusions The potentially enormous memory requirement of program reversal by complete logging often causes problems despite the ever increasing size of memory systems. This paper proposes an alternative method, where the memory requirement can be drastically reduced by keeping at most c intermediate states as checkpoints. In order to avoid an increase in runtime, p processors are used to reverse evolutions with minimal wall clock time. For the presented time-minimal parallel reversal schedules, the number I of physical steps that can be reversed grows exponentially as a function of the resource number g = c + p. A corresponding soft- ware tool has been coded using MPI. Initial numerical tests are reported. They confirm the enormous reduction in memory requirement. Further- more, the runtime behaviour is studied. It is verified that the wall clock time of the computation can be reduced compared to the logging-all ap- proach if the memory access is comparatively costly. This fact is caused by the reduced storage in use. If the memory access is comparatively cheap, the theoretical runtime of time-minimal parallel reversal sched- ules is also confirmed. The following overall conclusion can be drawn. For adjoining sim- ulations log^(^)(# physical steps) processors and checkpoints are wall Adjoint Calculation using Time-minimal Program Reversals 331 clock equivalent to 1 processor and (# physical steps) checkpoints with a(f) = ^(1 + and f the temporal complexity of a reverse step. Acknowledgments The authors are indebted to Daniele Casanova for his support during the numerical experiments and to Andreas Griewank for many fruitful discussions. References [1] J. Allen. Computer optimisation of cornering line. Master’s thesis, School of Mechanical Engineering, Cranfield University, 1997. [2] E. Bakker, H. Pacejka, and L. Lidner. A new tire model with an application in vehicle dynamics studies. SAE-Paper, 890087, 1989. [3] Y. Evtushenko. Automatic differentiation viewed from optimal control. In G. F. Corliss and A. Griewank, editors. Computational Differentiation: Techniques, Implementations, and Application, Philadelphia, 1991. SIAM. [4] M. Garey and D. Johnson. Computers and intractability: Aguide to the theory of NP- completeness. Freeman and Company, New York, 1980. [5] A. Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Frontiers in Applied Mathematics. SIAM, Philadelphia, 1999. [6] A. Griewank and A. Walther. Revolve: An implementation of checkpoint- ing for the reverse or adjoint mode of computational differentiation. ACM Trans. Math. Software, 26, 2000. [7] J. Grimm, L. Pottier, and N. Rostaing-Schmidt. Optimal time and minimum space-time product for reversing a certain class of programs. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors. Computational Differentia- tion: Techniques, Applications, and Tools, Philadelphia, 1996. SIAM. [8] W. Hager. Runge-kutta methods in optimal control and the transformed adjoint system. Numer. Math., 87:247-282, 2000. [9] P. Hilton and J. Petersen. A fresh look at old favourites: The fibonacci and lucas sequences revisited. Australian Mathematical Society Gazette, 25:146-160, 1998. [10] U. Lehmann and A. Walther. The implementation and testing of time-minimal and resource-optimal parallel reversal schedules. Technical Report ZHR-IR- 0109, Tech. Univ. Dresden, Center for High Perf. Comp., 2001. [11] O. Talagrand. The use of adjoint equations in numerical modeling of the at- mospheric circulation. In G. F. Corliss and A. Griewank, editors. Computa- tional Differentiation: Techniques, Implementations, and Application, Philadel- phia, 1991. SIAM. [12] J. van de Snepscheut. What computing is all about. Texts and Monographs in Computer Science. Springer, Berlin, 1993. [13] A. Walther. Program Reversal Schedules for Single- and Multi- process or Ma- chines. PhD thesis. Tech. Univ. Dresden, Inst, for Sci. Comp., 1999.