• Non ci sono risultati.

Preconditioned Nonlinear Conjugate Gradient methods

N/A
N/A
Protected

Academic year: 2021

Condividi "Preconditioned Nonlinear Conjugate Gradient methods"

Copied!
29
0
0

Testo completo

(1)

Preconditioned Nonlinear Conjugate Gradient methods

Giovanni Fasano and Massimo Roma

Contents

1. Introduction(3).

2. PreconditionedNonlinearConjugateGradientalgorithm(6).

3. AnewSymmetricRank-2update(7).

4. ApreconditionerusingaBFGSlikelowrankquasi-Newtonupdate(13).

5. Preliminarynumericalexperiences(18).

6. Conclusionsandfutureworks(27).

(2)
(3)

1. Introduction

Wedealwiththelargescaleunconstrainedoptimizationproblem

(1.1)

min

𝑥∈IR

𝑛

𝑓 (𝑥)

where

𝑓 : IR 𝑛 −→ IR

is a twice continuously dierentiablefunction and

𝑛

is

large. Weassumethat foragiven

𝑥 0 ∈ IR 𝑛

thelevelset

Ω 0 = {𝑥 ∈ IR 𝑛 ∣ 𝑓 (𝑥) ≤ 𝑓 (𝑥 0 )}

iscompact. Thehugenumberofrealworldapplicationswhichcanbemodelled

asalargescaleoptimization problem stronglymotivatesthe growinginterest

forthesolutionofsuch problems.

Among the iterative methods for large scale unconstrained optimization,

when the Hessian matrix is possibly dense, limited memory quasiNewton

methods are often themethods of choice. As well known(see any textbook,

e.g. [11]),theygenerateasequence

{𝑥 𝑘 }

,accordingtothefollowingscheme

(1.2)

𝑥 𝑘+1 = 𝑥 𝑘 + 𝛼 𝑘 𝑝 𝑘 , 𝑘 = 0, 1, . . . ,

with

𝑝 𝑘 = −𝐻 𝑘 ∇𝑓 (𝑥 𝑘 ),

where

𝐻 𝑘

is an approximation of the inverse of the Hessian matrix

2 𝑓 (𝑥 𝑘 )

and

𝛼 𝑘

isasteplength. Inparticular,insteadofcomputing

𝐻 𝑘

ateachiteration

𝑘

, these methods update

𝐻 𝑘

in a simplemanner, in order to obtain thenew

approximation

𝐻 𝑘+1

to be used in the next iteration. Moreover, instead of

storingfulldense

𝑛 × 𝑛

approximations,theyonlysaveafewvectorsoflength

𝑛

,which allowtorepresenttheapproximationsimplicitly.

AmongthequasiNewtonschemes,theLBFGSmethodisusuallyconsid-

eredoneofthemostecient. Itiswellsuitedforlargescaleproblemsbecause

theamount of storageis limited and controlled by the user. This method is

basedon theconstruction of theapproximationof the inverse of theHessian

matrix,byexploitingcurvatureinformation gainedonlyfrom themostrecent

iterations. The inverse oftheHessianmatrixis updatedatthe

𝑘

-thiteration

(4)

(1.3)

𝐻 𝑘+1 = 𝑉 𝑘 𝑇 𝐻 𝑘 𝑉 𝑘 + 𝜌 𝑘 𝑠 𝑘 𝑠 𝑇 𝑘

where

𝜌 𝑘 = 1 𝑦 𝑇 𝑘 𝑠 𝑘

, 𝑉 𝑘 = 𝐼 − 𝜌 𝑘 𝑦 𝑘 𝑠 𝑇 𝑘 ,

and

(1.4)

𝑠 𝑘 = 𝑥 𝑘+1 − 𝑥 𝑘 = 𝛼 𝑘 𝑝 𝑘 , 𝑦 𝑘 = ∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 ).

Observethat

𝐻 𝑘

alsosatisesrelation

𝐻 𝑘 = (𝑉 𝑘−1 𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚 𝑇 )𝐻 𝑘 0 (𝑉 𝑘−𝑚 ⋅ ⋅ ⋅ 𝑉 𝑘−1 )

+ 𝜌 𝑘−𝑚 (𝑉 𝑘−1 𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚+1 𝑇 )𝑠 𝑘−𝑚 𝑠 𝑇 𝑘−𝑚 (𝑉 𝑘−𝑚+1 ⋅ ⋅ ⋅ 𝑉 𝑘−1 ) + 𝜌 𝑘−𝑚+1 (𝑉 𝑘−1 𝑇 ⋅ ⋅ ⋅ 𝑉 𝑘−𝑚+2 𝑇 )𝑠 𝑘−𝑚+1 𝑠 𝑇 𝑘−𝑚+1 (𝑉 𝑘−𝑚+2 ⋅ ⋅ ⋅ 𝑉 𝑘−1 ) + ⋅ ⋅ ⋅

+ 𝜌 𝑘−1 𝑠 𝑘−1 𝑠 𝑇 𝑘−1 ,

where

𝑚

is thememory of themethod and

𝐻 𝑘 0

is aninitial approximationof theinverseoftheHessianmatrix.

The well known reasons for the success of the LBFGS method can be

summarizedinthefollowingtwopoints: rstly,evenwhen

𝑚

issmall,

𝐻 𝑘+1

is

aneectiveapproximationoftheinverseoftheHessianmatrix,secondly

𝐻 𝑘+1

istheunique(positivedenite)matrixwhichsolvestheproblem

min 𝐻 ∥𝐻 − 𝐻 𝑘 ∥ 𝐹

𝑠.𝑡. 𝐻 = 𝐻 𝑇 𝑠 𝑘 = 𝐻𝑦 𝑘 ,

where

∥ ⋅ ∥ 𝐹

istheFrobeniusnorm. Namely,

𝐻 𝑘+1

isthepositivedenitema-

trixclosest to thecurrentapproximation

𝐻 𝑘

, satisfyingthesecantequation

𝑠 𝑘 = 𝐻𝑦 𝑘

. However,LBFGSmethodpresentssomedrawbacks,includingthe

(5)

theHessianmatrixare veryspread. Moreover,onsomeapplications, theper-

formancesofLBFGSmethodandtheNonlinearConjugateGradientmethod

arecomparable.

In this paper we focus on the latter method: the Nonlinear Conjugate

Gradientmethod (NCG). As well known (see any textbook,e.g. [11]) it isa

naturalextension to generalfunctions ofthelinearConjugateGradient(CG)

method for quadratic functions. It generates a sequence

{𝑥 𝑘 }

according to

scheme(1.2),with

𝑝 𝑘 = −∇𝑓 (𝑥 𝑘 ) + 𝛽 𝑘 𝑝 𝑘−1 ,

where

𝛽 𝑘

isasuitablescalar. Dierentvaluesof

𝛽 𝑘

giverisetodierentalgo-

rithms (see[8] forasurvey). The mostcommonare theFletcherand Reeves

(FR),thePolakandRibière(PR)andtheHestenesandStiefel(HS)algorithms.

Although theNCG methods havebeenwidely studied and are oftenvery

ecientwhensolvinglargescaleproblems,akeypointforincreasingtheire-

ciencyistheuseofapreconditioningstrategy,especiallywhensolvingdicult

illconditionedproblems. Deninggoodpreconditioners for NCGmethods is

currentlystill considered achallengingresearch topic. Onthis guideline, this

work is devoted to investigate the use of quasiNewton updates as precon-

ditioners. In particular, we want to propose preconditioners which possibly

inherittheeectivenessoftheLBFGSupdate. Indeed,herewebuildprecon-

ditionersiterativelydenedandbasedonquasiNewtonupdatesoftheinverse

of the Hessian matrix. This represents an attempt to improve the eciency

oftheNCGmethod byconveyinginformationcollectedfrom aquasiNewton

method,in aPreconditionedNonlinearConjugateGradientmethod(PNCG).

Inparticular, westudynewsymmetriclowrankupdatesoftheinverseofthe

Hessianmatrix,inorder toiterativelydenepreconditionersforPNCG.

Itis worthto notethatthere existsacloseconnectionbetweenBFGSand

NCG[10],andontheotherhand,NCGalgorithmscanbeviewedasmemoryless

quasiNewtonmethods(seee.g.,[13],[12], [11]).

TheideaofusingaquasiNewtonupdate asapreconditionerwithinNCG

algorithms isnot new. In[2], when storage isavailable,apreconditioner de-

ned by

𝑚

quasiNewton updates is used within NCG algorithm. In [1] a

(6)

ofNCG.Moreover,anautomaticpreconditioningstrategybasedonalimited

memoryquasiNewtonupdateforthelinearCGisproposedin[9],withinHes-

sian free Newton methods, and is extended to the solution of a sequence of

linearsystems.

In this paper, we propose two classes of parameters dependent precondi-

tioners. Inparticular,inthenextsectionwebrieyrecallaschemeofageneral

PNCG method. In Section 3 a new symmetric rank-2 update is introduced

and its theoretical properties are studied. Section 4 is devoted to describe a

new BFGSlike quasiNewton update. Finally, in Section 5 the results of a

preliminarynumericalexperiencearereported,showingacomparisonbetween

oneofourproposalsandanLBFGSbasedpreconditionerforPNCG.

2. Preconditioned Nonlinear Conjugate Gradient algorithm

InthissectionwereporttheschemeofageneralPreconditionedNonlinear

ConjugateGradient (PNCG) algorithm (seee.g. [12]). InthePNCG scheme

𝑀 𝑘

denotes thepreconditionerattheiteration

𝑘

.

Preconditioned NonlinearConjugate Gradient (PNCG) algorithm

Step 1: Data

𝑥 1 ∈ IR 𝑛

. Set

𝑝 1 = −𝑀 1 ∇𝑓 (𝑥 1 )

and

𝑘 = 1

.

Step 2: Computethesteplength

𝛼 𝑘

byusingalinesearchprocedurewhich

guaranteestheWolfeconditionstobesatised,andset

𝑥 𝑘+1 = 𝑥 𝑘 + 𝛼 𝑘 𝑝 𝑘 .

Step 3: If

∥∇𝑓 (𝑥 𝑘+1 )∥ = 0

thenstop,elsecompute

𝛽 𝑘+1

and

(2.1)

𝑝 𝑘+1 = −𝑀 𝑘+1 ∇𝑓 (𝑥 𝑘+1 ) + 𝛽 𝑘+1 𝑝 𝑘 ,

set

𝑘 = 𝑘 + 1

andgotoStep2.

(7)

By setting

𝑀 𝑘 = 𝐼

for any

𝑘

, the popular (unpreconditioned) Nonlinear ConjugateGradient (NCG) method isobtained. The parameter

𝛽 𝑘+1

can be

chosenin avarietyof ways. ForPNCGalgorithm themost recurrentchoices

arethefollowing:

𝛽 𝑘+1 FR = ∇𝑓 (𝑥 𝑘+1 ) 𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 )

∇𝑓 (𝑥 𝑘 ) 𝑇 ∇𝑓 (𝑥 𝑘 ) ,

(2.2)

𝛽 𝑘+1 PR = [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] 𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 )

∇𝑓 (𝑥 𝑘 ) 𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘 ) ,

(2.3)

𝛽 𝑘+1 HS = [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] 𝑇 𝑀 𝑘 ∇𝑓 (𝑥 𝑘+1 ) [∇𝑓 (𝑥 𝑘+1 ) − ∇𝑓 (𝑥 𝑘 )] 𝑇 𝑝 𝑘

.

(2.4)

Werecall that with respect toother gradient methods, amoreaccurate line-

searchprocedureisrequiredto determinethesteplength

𝛼 𝑘

inaPNCGalgo-

rithm. Thisisduetothepresenceoftheterm

𝛽 𝑘+1 𝑝 𝑘

in(2.1). Thelatterfact

motivatestheuse ofthe(strong)Wolfe conditionstocomputethe steplength

𝛼 𝑘

,whichalso guaranteethat

𝑠 𝑇 𝑘 𝑦 𝑘 > 0

forany

𝑘

.

As already said, preconditioningis applied for increasing the eciency of

theNCG method. Inthis regard, we remark anoticeable dierencebetween

linearCGand NCG. Wheneverthe linearCG is applied, theHessianmatrix

doesnotchangeduring theiterationsofthealgorithm. Onthecontrary,when

NCGisappliedtoanonlinearfunction,theHessianmatrix(possiblyindenite)

changesateachiteration.

3. A new Symmetric Rank-2 update

InthissectionwestudyanewquasiNewtonupdatingformula,byconsid-

eringthepropertiesofaparameterdependentsymmetricrank-2(SR2)update

of the inverse of theHessian matrix. Suppose we generate after

𝑘

iterations

thesequenceofiterates

{𝑥 1 , . . . , 𝑥 𝑘+1 }

. ThenourquasiNewtonupdate

𝐻 𝑘+1

,

whichapproximates

[∇ 2 𝑓 (𝑥)] −1

,satisesthesecantequationalongallprevious

directions;namelyitresults

𝐻 𝑘+1 𝑦 𝑗 = 𝑠 𝑗 ,

forall

𝑗 ≤ 𝑘.

(8)

Observethatthelatterappealingpropertyissatisedbyalltheupdatesofthe

Broydenclass,providedthatthelinesearchadoptedisexact(seee.g. [11]). We

wouldliketorecoverthemotivationunderlyingthelatterclassofupdates,and

byusingrank-2updates wewouldliketodeneapreconditionerforPNCG.

Onthis guideline, inorder to buildanapproximateinverseoftheHessian

matrix,weconsidertheupdate

(3.1)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 ) = 𝐻(𝛾 𝑘 , 𝜔 𝑘 ) + Δ 𝑘 , Δ 𝑘 ∈ IR 𝑛×𝑛 , symmetric,

wherethesequence

{𝐻(𝛾 𝑘 , 𝜔 𝑘 )}

dependsontheparameters

𝛾 𝑘

,

𝜔 𝑘

andprovides

ourquasi-Newtonupdatesof

[∇ 2 𝑓 (𝑥)] −1

.

Itisrstourpurposeto proposethenewupdate

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

suchthat:

(0)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

iswell-dened andnonsingular (1)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

canbeiterativelyupdated

(2)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

collectstheinformationfromtheiterations

1, 2, . . . , 𝑘

ofa

NCGmethod

(3)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

satisesthesecantequationatiterations

𝑗 = 1, 2, . . . , 𝑘

(4)

𝐻(𝛾 𝑘+1 , 𝜔 𝑘+1 )

either tends to preserve the inertia of the inverse of

2 𝑓 (𝑥 𝑘+1 )

, in case

𝑓 (𝑥)

is ageneral quadraticfunction or,by suitably

settingthetwoparameters,itcanbeusedasapreconditionerforPNCG,

i.e.

𝑀 𝑘 = 𝐻(𝛾 𝑘 , 𝜔 𝑘 )

.

Observe that the Symmetric Rank-1 (SR1) quasi-Newton update (see Sec-

tion 6.2 in [11]) satises properties (1)-(4) but not the property (0), i.e. it

mightbepossiblynotwelldenedforageneralnonlinearfunction. Thelatter

result follows from the fact that SR1 update provides only a rank-1 quasi-

Newtonupdate, unlikeBFGS andDFP. Ontheotherhand,while BFGSand

DFP quasi-Newton formulae provide only positive denite updates, the SR1

formulaisabletorecovertheinertiaoftheHessianmatrix,bygeneratingpos-

sibly indenite updates. Thus, now we want to study an SR2 quasi-Newton

(9)

updateisprovidedbyinformationfromtheNCGmethod. Tothisaim,assum-

ingthat

𝐻 𝑘 = 𝐻(𝛾 𝑘 , 𝜔 𝑘 )

isgiven,weconsider therelation(3.1) whereweset

(see(1.4))

Δ 𝑘 = 𝛾 𝑘 𝑣 𝑘 𝑣 𝑘 𝑇 + 𝜔 𝑘

𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑇 𝑘 𝑝 𝑘

, 𝛾 𝑘 , 𝜔 𝑘 ∈ IR, 𝑣 𝑘 ∈ IR 𝑛 ,

and

𝑝 𝑘

is generated at the

𝑘−

th iteration of the (unpreconditioned) NCG method. Thus,wewillhavethenewupdate

(3.2)

𝐻 𝑘+1 = 𝐻 𝑘 + 𝛾 𝑘 𝑣 𝑘 𝑣 𝑘 𝑇 + 𝜔 𝑘

𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑇 𝑘 𝑝 𝑘

, 𝛾 𝑘 , 𝜔 𝑘 ∈ IR, 𝑣 𝑘 ∈ IR 𝑛 ,

andinordertosatisfythesecantequation

𝐻 𝑘+1 𝑦 𝑘 = 𝑠 𝑘

thefollowingequality

musthold

𝐻 𝑘 𝑦 𝑘 + 𝛾 𝑘 (𝑣 𝑇 𝑘 𝑦 𝑘 )𝑣 𝑘 + 𝜔 𝑘

𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘 𝑇 𝑝 𝑘

𝑦 𝑘 = 𝑠 𝑘 ,

thatis

(3.3)

𝛾 𝑘 (𝑣 𝑇 𝑘 𝑦 𝑘 )𝑣 𝑘 = 𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 .

Thereforeitresults

(3.4)

𝑣 𝑘 = 𝜎 𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )

forsomescalar

𝜎 𝑘 ∈ IR

. Bysubstitutingtheexpression(3.4)of

𝑣 𝑘

in(3.3)we

have

𝛾 𝑘 𝜎 2 𝑘 [𝑦 𝑘 𝑇 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )] (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) = 𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 .

Thus, thefollowingrelationamong theparameters

𝛾 𝑘

,

𝜎 𝑘

and

𝜔 𝑘

musthold

(3.5)

𝛾 𝑘 𝜎 𝑘 2 = 1

𝑠 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘

.

Note that from the arbitrarinessof

𝛾 𝑘

, withoutlossof generality, wecan set

𝜎 𝑘 ∈ {−1, 1}

.

Now, in the nextproposition werst consider thecase of quadraticfunc-

tions,and provethat the update (3.2)satises thesecantequation, alongall

(10)

Proposition 3.1. Assume that

𝑓

isthequadratic function

𝑓 (𝑥) = 1 2 𝑥 𝑇 𝐴𝑥 + 𝑏 𝑇 𝑥

, where

𝐴 ∈ IR 𝑛×𝑛

issymmetricand

𝑏 ∈ IR 𝑛

. Suppose that

𝑘

stepsofthe

(unpreconditioned)CGareperformed,in orderto detectthestationarypoint

(ifany)ofthefunction

𝑓

,andthatthevectors

𝑝 1 , . . . , 𝑝 𝑘

aregenerated. Then,

thematrix

𝐻 𝑘+1

in(3.2)satisesthesecantequations

(3.6)

𝐻 𝑘+1 𝑦 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑘,

providedthat thecoecients

𝛾 𝑗

,

𝜔 𝑗

,

𝑗 = 1, . . . , 𝑘

arecomputedsuchthat

(3.7)

𝛾 𝑗 = 1

𝑠 𝑇 𝑗 𝑦 𝑗 − 𝑦 𝑗 𝑇 𝐻 𝑗 𝑦 𝑗 − 𝜔 𝑗 𝑝 𝑇 𝑗 𝑦 𝑗

, 𝑗 = 1, . . . , 𝑘,

𝜔 𝑗 ∕= 𝑠 𝑇 𝑗 𝑦 𝑗 − 𝑦 𝑗 𝑇 𝐻 𝑗 𝑦 𝑗

𝑝 𝑇 𝑗 𝑦 𝑗

, 𝑗 = 1, . . . , 𝑘.

Proof Theproofproceedsbyinduction. Equations(3.6)holdfor

𝑘 = 1

,

thatis

𝐻 2 𝑦 1 = 𝑠 1

,aslongas

𝑠 1 = [

𝐻 1 + 𝛾 1 𝜎 1 2 (𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 )(𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 ) 𝑇 + 𝜔 1

𝑝 1 𝑝 𝑇 1 𝑦 𝑇 1 𝑝 1

] 𝑦 1 ,

orequivalently

𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 = 𝛾 1 (𝑠 𝑇 1 𝑦 1 − 𝑦 1 𝑇 𝐻 1 𝑦 1 − 𝜔 1 𝑝 𝑇 1 𝑦 1 ) [𝑠 1 − 𝐻 1 𝑦 1 − 𝜔 1 𝑝 1 ] ,

whichissatisedselecting

𝛾 1

and

𝜔 1

accordingwith(3.7).

Now, suppose that the relations (3.6)hold forthe index

𝑘 − 1

. Tocomplete

theinduction we need to provethat the relations (3.6) hold for theindex

𝑘

.

Firstly,notethat

𝐻 𝑘+1 𝑦 𝑘 = 𝑠 𝑘

holds. Infact

𝑠 𝑘 = [

𝐻 𝑘 + 𝛾 𝑘 𝜎 2 𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) 𝑇 + 𝜔 𝑘

𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑇 𝑘 𝑝 𝑘

] 𝑦 𝑘

holdsifandonlyif

𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 = 𝛾 𝑘 (𝑠 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ),

andthelatter holdsfrom (3.7)with

𝑗 = 𝑘

. Now, wehaveto provethat (3.6)

holdforany

𝑗 < 𝑘

. For

𝑗 < 𝑘

wehave

𝐻 𝑘+1 𝑦 𝑗 = 𝐻 𝑘 𝑦 𝑗 + 𝛾 𝑘 𝜎 2 𝑘 (𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 )(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 − 𝜔 𝑘 𝑝 𝑘 ) 𝑇 𝑦 𝑗 + 𝜔 𝑘

𝑝 𝑇 𝑘 𝑦 𝑗

𝑦 𝑘 𝑇 𝑝 𝑘

𝑝 𝑘 ,

(11)

where

𝐻 𝑘 𝑦 𝑗 = 𝑠 𝑗

bytheinductivehypothesis. Moreover,

(𝑠 𝑘 − 𝐻 𝑘 𝑦 𝑘 ) 𝑇 𝑦 𝑗 = 𝑠 𝑇 𝑘 𝑦 𝑗 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑗 = 𝑠 𝑇 𝑘 𝑦 𝑗 − 𝑦 𝑘 𝑇 𝑠 𝑗 = 𝑠 𝑇 𝑘 𝐴𝑠 𝑗 − (𝐴𝑠 𝑘 ) 𝑇 𝑠 𝑗 = 0,

where the third equality holds since

𝑦 𝑗 = 𝐴𝑠 𝑗

, for any

𝑗

, for the quadratic

function

𝑓

. Finally,

𝜔 𝑘 𝑝 𝑇 𝑘 𝑦 𝑗 = 𝜔 𝑘 𝑝 𝑇 𝑘 𝐴𝑠 𝑗 = 𝜔 𝑘 𝛼 𝑗 𝑝 𝑇 𝑘 𝐴𝑝 𝑗 = 0,

which follows from the conjugacy of thedirections

{𝑝 1 , . . . , 𝑝 𝑘 }

generated by

theCG.Thus,(3.6)holdforany

𝑗 ≤ 𝑘

andtheinductioniscomplete.

As an immediate consequence of the previousproposition, we prove now

thenite terminationproperty for aquadratic function, i.e. after at most

𝑛

steps,

𝐻 𝑛+1

istheinverseoftheHessianofthequadraticfunction.

Corollary3.1. Assumethat

𝑓

isthequadraticfunction

𝑓 (𝑥) = 1 2 𝑥 𝑇 𝐴𝑥 + 𝑏 𝑇 𝑥

,

where

𝐴 ∈ IR 𝑛×𝑛

is symmetric and

𝑏 ∈ IR 𝑛

. Suppose that

𝑛

steps of the

(unpreconditioned)CGareperformed,in orderto detectthestationarypoint

ofthefunction

𝑓

,andthatthevectors

𝑝 1 , . . . , 𝑝 𝑛

aregenerated. If (3.7)holds,

wehave

𝐻 𝑛+1 = 𝐴 −1

.

Proof Byapplying Proposition 3.1,wehavethat(3.6)hold for

𝑘 = 𝑛

,

i.e.

𝐻 𝑛+1 𝑦 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑛.

Since

𝑓

isquadraticthen

𝑦 𝑗 = 𝐴𝑠 𝑗

,forany

𝑗

,i.e.

𝐻 𝑛+1 𝐴𝑠 𝑗 = 𝑠 𝑗 , 𝑗 = 1, . . . , 𝑛.

Now, since

𝑠 𝑗 = 𝛼 𝑗 𝑝 𝑗

,

𝑗 = 1, . . . , 𝑛

, theconjugacy ofthevectors

{𝑝 1 , . . . , 𝑝 𝑛 }

impliesthat

𝐻 𝑛+1 = 𝐴 −1

.

Wehighlightthat,whenever

𝑘 = 𝑛

,Corollary3.1justies therstpartof

the statement (4) on page 8. Moreover, later on in the paper weshowthat

for

𝑘 < 𝑛

, the update matrix in (3.2) can be suitably modied to providea

(12)

Afteranalyzingthecaseof

𝑓 (𝑥)

quadratic,weturnnowtothegeneralcase

ofanonlineartwicecontinuouslydierentiablefunction. Inparticular,sincewe

areinterestedinusingthematrix

𝐻 𝑘+1

in(3.2)asapreconditioner,weneedto investigate ifthereexists asuitablesettingoftheparameterssuchthat

𝐻 𝑘+1

is positive denite, provided that (3.7) are satised. In the nextproposition

weprovethat iftheparameter

𝜔 𝑘

isbelowathresholdvalue,thenthematrix

𝐻 𝑘+1

isalmostalwayspositivedenite.

Proposition3.2. Let

𝑓

beanonlineartwicecontinuouslydierentiablefunc- tion. Suppose that the(unpreconditioned)NCG method isused to minimize

thefunction

𝑓

. Supposethat (3.7)issatisedand

(3.8)

0 ≤ 𝜔 𝑘 < 𝑠 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑇 𝑘 𝐻 𝑘 𝑦 𝑘

𝑝 𝑇 𝑘 𝑦 𝑘

,

with

(3.9)

𝑦 𝑇 𝑘 𝑠 𝑘 + 𝑦 𝑇 𝑘 𝐻 𝑘 𝑦 𝑘 ≤ 0

or

𝑦 𝑇 𝑘 𝑠 𝑘 − 𝑦 𝑇 𝑘 𝐻 𝑘 𝑦 𝑘 ≥ 0,

where

𝑠 𝑗 = 𝛼 𝑗 𝑝 𝑗

. Thenthematrix

𝐻 𝑘+1

in(3.2)ispositivedenite.

Proof Bysubstituting(3.4)in(3.2),recallingthat

𝜎 2 𝑘 = 1

weobtain

𝐻 𝑘+1 = 𝛾 𝑘

[ (𝛼 𝑘 − 𝜔 𝑘 ) 2 𝑝 𝑘 𝑝 𝑇 𝑘 + (𝛼 𝑘 − 𝜔 𝑘 ) (

(𝐻 𝑘 𝑦 𝑘 ) 𝑝 𝑇 𝑘 + 𝑝 𝑘 (𝐻 𝑘 𝑦 𝑘 ) 𝑇 ) + (𝐻 𝑘 𝑦 𝑘 ) (𝐻 𝑘 𝑦 𝑘 ) 𝑇 ]

+ 𝜔 𝑘

𝑝 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘 𝑇 𝑝 𝑘

.

Hence

𝐻 𝑘+1

canberewrittenintheform

( 𝑝 𝑘

.

.

.

𝐻 𝑘 𝑦 𝑘

)

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 2 + 𝜔 𝑘

𝑦 𝑇 𝑘 𝑝 𝑘

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 )

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 𝛾 𝑘

⎝ 𝑝 𝑇 𝑘

. . .

(𝐻 𝑘 𝑦 𝑘 ) 𝑇

⎠ .

Therefore

𝐻 𝑘+1

ispositivedeniteifandonlyifthefollowinginequalitieshold:

(3.10)

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 2 + 𝜔 𝑘

𝑦 𝑇 𝑘 𝑝 𝑘

> 0 𝛾 𝑘

(

𝛾 𝑘 (𝛼 𝑘 − 𝜔 𝑘 ) 2 + 𝜔 𝑘

𝑦 𝑘 𝑇 𝑝 𝑘

)

− 𝛾 2 (𝛼 𝑘 − 𝜔 𝑘 ) 2 > 0.

(13)

Usingtheexpressionof

𝛾 𝑘

in(3.5)andrecallingthat

𝑦 𝑘 𝑇 𝑠 𝑘 > 0

(asaconsequence oftheWolfeconditions),(3.10)areequivalentto

(𝛼 𝑘 − 𝜔 𝑘 ) 2 𝑦 𝑘 𝑇 𝑝 𝑘

(𝛼 𝑘 − 𝜔 𝑘 )𝑝 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘

+ 𝜔 𝑘 > 0 𝜔 𝑘

(𝛼 𝑘 − 𝜔 𝑘 )𝑝 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘

> 0.

Aftersomecomputationweobtainthatthereexistvaluesoftheparameter

𝜔 𝑘

forwhich thelatter inequalitiesadmit solutions,with onlyoneexception. In

fact,theyaresatisedforanyvalueof

𝜔 𝑘

suchthat

0 ≤ 𝜔 𝑘 < 𝛼 𝑘 𝑝 𝑇 𝑘 𝑦 𝑘 − 𝑦 𝑇 𝑘 𝐻 𝑘 𝑦 𝑘

𝑝 𝑇 𝑘 𝑦 𝑘

buttheydonotadmitsolutionincase

𝛼 𝑘 𝑦 𝑇 𝑘 𝑝 𝑘 + 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘 > 0

and

𝛼 𝑘 𝑦 𝑘 𝑇 𝑝 𝑘 − 𝑦 𝑘 𝑇 𝐻 𝑘 𝑦 𝑘 < 0,

i.e. when (3.9)doesnothold.

From Proposition 3.1 and Corollary 3.1, we could use the matrix

𝐻 𝑘+1

asanapproximateinverseof

2 𝑓 (𝑥)

. However,Proposition3.2evidencesthat conditions(3.7)and(3.8)donotsucetoensure

𝐻 𝑘+1

positivedenite. Infact,

whenever (3.9)occurs, additional safeguard is needed since

𝐻 𝑘+1

is possibly

indenite. Thus, the denition of

𝐻 𝑘+1

should be possibly modied in order

toobtainpositivedeniteupdates.

4. A preconditioner using a BFGS–like low–rank quasi-Newton update

InthissectionwepartiallyaddressthenalremarkofSection3. Indeed,we

introduceanewclassof preconditionerswhicharestilliterativelyconstructed

by using information from the NCG iterations and, as in the case of BFGS

updates, theyarealwayspositivedenite. Onthis purpose, thepricewepay

withrespectto(3.2),isthatthesecantequationissatisedonlyatthecurrent

(14)

Wedrawourinspirationfrom [4],whereanewpreconditionerforNewton

Krylovmethodsisdescribed. Inparticular,in[4]thesetofdirectionsgenerated

by the Krylov subspace method is used to provide an approximate inverse

preconditioner,forthesolutionofNewton'ssystems. Onthisguideline,observe

that if

𝑓 (𝑥) = 1 2 𝑥 𝑇 𝐴𝑥 + 𝑏 𝑇 𝑥

, where

𝐴

is positive denite and

𝑏 ∈ IR 𝑛

, then

itis well known(see e.g. [6])that the CGmethod maygenerate

𝑛

conjugate

directions

{𝑝 𝑗 }

suchthat

(4.1)

𝐴 −1 =

𝑛

𝑗=1

𝑝 𝑗 𝑝 𝑇 𝑗 𝑝 𝑇 𝑗 𝐴𝑝 𝑗

.

Now,inordertointroduceaclassofpreconditionersfortheNCG,incaseofa

generaltwicecontinuoslydierentiablefunction

𝑓

,supposewehaveperformed

𝑘

iterationsof the (unpreconditioned)NCG, sothat the directions

𝑝 1 , . . . , 𝑝 𝑘

aregenerated. Letus considerthematrix

𝑀 𝑘+1

denedby

(4.2)

𝑀 𝑘+1 = 𝜏 𝑘 𝐶 𝑘 + 𝛾 𝑘 𝑣 𝑘 𝑣 𝑇 𝑘 + 𝜔 𝑘 𝑘

𝑗=𝑘−𝑚

𝑝 𝑗 𝑝 𝑇 𝑗

𝑝 𝑇 𝑗2 𝑓 (𝑥 𝑗 )𝑝 𝑗

,

where

0 ≤ 𝑚 ≤ 𝑘

,

𝛾 𝑘 , 𝜔 𝑘 ≥ 0

,

𝜏 𝑘 > 0

,

𝐶 𝑘 ∈ IR 𝑛×𝑛

is symmetric positive

deniteand

𝑣 𝑘 ∈ IR 𝑛

. Inordertouse

𝑀 𝑘+1

asapreconditionerandtoupdate itsexpressioniteratively,weset

𝜏 𝑘 = 1

,

𝐶 𝑘 = 𝐻(𝜏 𝑘 , 𝛾 𝑘 , 𝜔 𝑘 )

(with

𝐻(𝜏 0 , 𝛾 0 , 𝜔 0 )

given)and rewrite(4.2)intheform

(4.3)

𝐻(𝜏 𝑘+1 , 𝛾 𝑘+1 , 𝜔 𝑘+1 ) = 𝐻(𝜏 𝑘 , 𝛾 𝑘 , 𝜔 𝑘 )+𝛾 𝑘 𝑣 𝑘 𝑣 𝑇 𝑘 +𝜔 𝑘 𝑘

𝑗=𝑘−𝑚

𝑝 𝑗 𝑝 𝑇 𝑗 𝑝 𝑇 𝑗2 𝑓 (𝑥 𝑗 )𝑝 𝑗

.

𝐻(𝜏 𝑘+1 , 𝛾 𝑘+1 , 𝜔 𝑘+1 )

may be treated as a symmetric quasiNewton update.

However,for simplicity, in thesequel weprefer to use themoregeneral form

givenby(4.2).

Observethat in the expression of

𝑀 𝑘+1

,

𝑣 𝑘 𝑣 𝑘 𝑇

representsarank-1 update

and from (4.1) thedyads

𝑝 𝑗 𝑝 𝑇 𝑗 /𝑝 𝑇 𝑗2 𝑓 (𝑥 𝑗 )𝑝 𝑗

are aimed to build an approxi-

mateinverse. Theinteger

𝑚

canbeviewedasalimitedmemory parameter,

similarlytotheLBFGSmethod. Moreover,wecansetthevector

𝑣 𝑘

andthe

parameters

𝜏 𝑘 , 𝛾 𝑘 , 𝜔 𝑘

such that the class of preconditioners

𝑀 𝑘

satises, for

Riferimenti

Documenti correlati

In this paper we focus on iterative Krylov-based methods for the solution of symmetric linear systems, arising in both numerical analysis and optimization contexts.. The theory

Nelle maglie di questo ordito, in cui l’accento non è più sulle differenze tra i diversi strumenti di fruizione, ma piuttosto sulle loro possibilità di dialogo

When Steepest Descent or CG reaches the minimum point, the residual becomes zero, and if Equation 11 or 48 is evaluated an iteration later, a division by zero will result.. It

The monitoring system installed in the Milan Cathedral (Figs. 5-7), fully computer based and with efficient transmission of the collected data, involves the measurement of different

Diarrhoea is a common side effect of many oral antibiotic therapies and in its most extreme, can lead to Clostridium difficile associated diarrhoea

outlined either in their meaning or in the proposed threshold (indicators 2.1, 2.4, 2.5, 2.6, 3.1–3.6); (2) the two algorithms of probability and impact risk assessment do not

BabyLux device: a diffuse optical system integrating diffuse correlation spectroscopy and time-resolved near-infrared spectroscopy for the neuromonitoring of the premature

M ICHELE C OMETA Archeologia della biopoetica a cura di Francesco de Cristofaro Leonardo Distaso Giampiero Moretti Ugo